Depending on the distribution of data in rows, or even the amount of rows in a table, using ORs can drastically affect performance, since the DB can switch from choosing to do an Index Seek, or do an Index Scan. This is not always bad, since depending on data on rows, sometimes it's better, but in this case, we can avoid them when using ORs by replacing them by UNION ALL.
For instance, let's check this scenario:
We have the table Subscriptions has more than 5m rows, and has 3 indexes (among others):
PK {SubscriptionID}
IX {PolicyNumber}
IX {AssociationNumber}
IX {IsInternal, PolicyNumber}
And this query:
select * from Subscriptions where IsInternal = '1' and (PolicyNumber like @sSubscriptionNu or AssociationNumber like @sSubscriptionNu)You see the OR in the predicate? If we execute this query, and check the execution plan, the engine cannot use the IXs, but uses directly the PK, causing an index scan:
Ok, lets work on the query then, removing the OR by using an UNION ALL:
select * from (select SubscriptionID, IsInternal from Subscriptions where PolicyNumber like @sSubscriptionNu union all select SubscriptionID, IsInternal from Subscriptions where AssociationNumber like @sSubscriptionNu ) S where S.IsInternal = '1'And let's check the execution plan now:
At first sight (even when it looks more complex) we can see an improvement in the execution plans. The first one costs 75% of the batch, and the second one 25%. Now, let's compare the results of both versions (old and new) using the SQL Profiler:
Of course, there are a lot of other scenarios where ORs is used and can be replaced by an UNION ALL, but the idea es the same. It is just a bit more complicated to refactor the SQL code.
As we saw, replacing the OR by an UNION ALL is better, but not always. If we execute the same query over an empty table (an extreme case, I know) the OR is better. But since a normal operation in a production environment would be with the table filled with a lot of rows, using an UNION ALL is prefered. So we can say that the UNION ALL is more scalable than the OR, since any change in the statistics, table content, or even a simple change in the query can turn a efficient query in a sluggish one.

No comments:
Post a Comment