Thursday, June 26, 2014

Replacing ORs by using UNION ALL

A common problem I often find in SPs is the use of an OR in the WHERE statements. It's very common for the business ask to be able to get something given certain conditions, that can affect several fields on a table, like getting a person by his name OR by his last name (or both). Although it seems that OR is easiest way to solve this (and yes, it is, from a human POV), the main problem using them in the predicates is that is not escalable.

Depending on the distribution of data in rows, or even the amount of rows in a table, using ORs  can drastically affect performance, since the DB can switch from choosing to do an Index Seek, or do an Index Scan. This is not always bad, since depending on data on rows, sometimes it's better, but in this case, we can avoid them when using ORs by replacing them by UNION ALL. 


For instance, let's check this scenario:


We have the table Subscriptions has more than 5m rows, and has 3 indexes (among others): 


PK {SubscriptionID}
IX {PolicyNumber}
IX {AssociationNumber}
IX {IsInternal, PolicyNumber}

And this query:
select *
from 
 Subscriptions
where IsInternal = '1'                 
and (PolicyNumber like @sSubscriptionNu or 
     AssociationNumber like @sSubscriptionNu) 
You see the OR in the predicate? If we execute this query, and check the execution plan, the engine cannot use the IXs, but uses directly the PK, causing an index scan:

Ok, lets work on the query then, removing the OR by using an UNION ALL:
select *
from 
     (select SubscriptionID, IsInternal from Subscriptions 
      where PolicyNumber like @sSubscriptionNu 
      union all
      select SubscriptionID, IsInternal from Subscriptions 
      where AssociationNumber like @sSubscriptionNu
      ) S
where S.IsInternal = '1'
And let's check the execution plan now:

At first sight (even when it looks more complex) we can see an improvement in the execution plans. The first one costs 75% of the batch, and the second one 25%. Now, let's compare the results of both versions (old and new) using the SQL Profiler:


Even here, we can see the logical reads have drop from 25599 to 379! Great improvement, it is.

Of course, there are a lot of other scenarios where ORs is used and can be replaced by an UNION ALL, but the idea es the same. It is just a bit more complicated to refactor the SQL code.


As we saw, replacing the OR by an UNION ALL is better, but not always. If we execute the same query over an empty table (an extreme case, I know) the OR is better. But since a normal operation in a production environment would be with the table filled with a lot of rows, using an UNION ALL is prefered. So we can say that the UNION ALL is more scalable than the OR, since any change in the statistics, table content, or even a simple change in the query can turn a efficient query in a sluggish one.


View Leonardo Esmoris's profile on LinkedIn