Tuesday, August 26, 2014

JOINs and Indexes

A couple of days ago, when performing some tests to check some timeouts in the DB, I discovered that the JOIN being done had no indexes in neither of both ends. So the first idea was to create both indexes, and then I was asked "Is it necessary to create index in both tables when doing a join between them?"

My first answer was "Yes, it's a good practice to do that". But the question kept hanging in my head: is it REALLY necessary to do so? So I went directly to the MSSMS to find out.

This is the scenario:
(Yes, no FK between them, and cannot be created because there were Child.ParentId that had no corresponding Parent.ParentId, strangely. No, the affected rows cannot be deleted)


Indexes are the following:

Table Child (3.8m rows approx)

  Index_name   Index_description Index_keys
IX_Child_Description nonclustered located on PRIMARY Description
PK_Child clustered, unique, primary key located on PRIMARY ChildID

Table Parent (5.6 m rows approx)

  Index_name   Index_description Index_keys
IX_Parent_Description nonclustered located on PRIMARY Description



No index on ParentId on neither of both tables. In a normal scenario, since the FK would exist, a Unique index on Parent.ParentID would also exist. 

The query was similar to this one:
select * from Child c
inner join Parent p on p.ParentID = c.ParentID
where p.Description = 'XXXXXXXXXXX'
So, as you can see the predicate is filtering the Parent table only. Let's see the resulting execution plan for this query:



There is a Index Scan being done on the Child table, because there is no index on the Child.ParentId column (Even the MSSMS suggests creating an index on that column). Let's first create and index on Parent.ParentID then, and re execute the query:
create unique clustered index PX_Parent on Parent(ParentId)

Better, but still an Index Scan on the Child table. Let's create the index and re execute the query:
create index IX_Child_ParentId on Child(ParentId)
Now, an Index Seek on the new index is being done. So, both indexes were used. In this scenario, creating both was useful. Ok, so far, I've said nothing new. But, what if I change the query like this:
select * from Child c
inner join Parent p on p.ParentID = c.ParentID
where c.Description = 'XXXXXXXXXXX'
So instead of filtering on the Parent table, I filter on the Child table. Is the index on Child.ParentId still useful? Let see...


No, the new index was not used. So, should I drop it, the execution plan should remain the same:
drop index Child.IX_Child_CommonId
The resulting execution plan:


So, the index is used only if I filter first on the Parent table, and then the retrieved rows are used to filter in the Child table. In this scenario, creating both indexes wasn't necessary. 

Summarizing, creating indexes on both tables when joining is a good practice, yes, but not always practical. If the scenario would have been the second query, creating an index only in the Parent.ParentId table would have solved the problem. Have in mind that this could differ depending on the rows on the table, statistics, different predicates, etc...

In the end, one would think that creating both indexes seems a good idea. Truth be told, is up to the dev to check if it's really necessary. Disk space could be an important constraint, if the table has a large amount of rows. In another context, for instance in a "chaotic" environment in which queries change a lot, it would be a good idea to create both indexes "just in case" to avoid potential performance issues ... As always, is up to the DBA to decide whether to create both indexes or not.

No comments:

Post a Comment