Thank you to Dev Nambi (B|T) for hosting this month's T-SQL Tuesday Party. For those of you that do not know T-SQL Tuesday was started by Adam Machanic (B|T) in 2009. If you feel like hosting 1 of these monthly events drop Adam a line. This months topic is on Assumptions we make about Indexes.
Working with SQL Server one of our main aims is to be able to provide a system that returns our results in a fast and optimal fashion. How do we achieve this? There are many factors involved in this particular question but for this months T-SQL Tuesday article I am only looking at Indexing.
Now we all know that we can only have one Clustered Index (CI) per table and not always but usually this is also our Primary Key for the table. Having this CI created on our table ensures that the data is sorted and stored physically in an order that is based on the key values that make up the Clustered Key. This may mean if your CI is based on an Int or BigInt and using an Identity then the data would be sorted in an increasing numerical fashion. Having this CI on its own is not the be all and end all to meet our requirements of providing a system to return our results in a fast and optimal fashion.
To go along with our CI we can create numerous Non Clustered Indexes (NCI) to help improve the performance of our queries or processes that are running in our SQL Server environment. Now a NCI is an additional structure that is created that contains the key values that make up our NCI along with a pointer for each key value to the data row that contains the key value so that we can hopefully retrieve the query result quickly.
Armed with this knowledge that the creation of a CI and numerous NCI’s on our table this should improve our performance out of site right? This is where my article starts talking about assumptions that are made. There are additional factors that we need to know about indexes for us to make Informed decisions around the indexes that are needed in our environments to achieve that optimal performance.
Something that we need to understand and look at is the utilisation of the table that we are indexing. Is the utilisation heavily used for write operations, read operations or an even mix of both? We need to know this because having an index can improve the query performance for the queries that utilise the key values in the index but there is a cost involved with that. Every time that we do an Insert, Update or delete the operation needs to occur not only on our CI (if one exists) or the Heap (No CI exists) but also for our NCI’s. This increase in write operations can have an impact on the performance of your system.
If your system undertakes a lot of Inserts, Updates and Deletes (which is normal) then there is a high likelihood of fragmentation in the indexes. Fragmented indexes lead to a degradation of performance and we are no longer providing that optimal system. We can get around this by implementing index maintenance jobs to regularly reduce the fragmentation to our CI and NCI’s.
We have created a CI and accompanied it with some NCI’s to improve our performance and we have our index maintenance working to keep our system performing optimally. So where are the assumptions? Well a couple of years ago I was on a client site and was asked if I could have a chat with the developer as they were having some performance problems with one of the production databases. After some initial discussions with the developer to get an idea of the performance issues being experienced it came to light that the database was the production OLTP database and it was very busy with Inserts, Updates and Delete, but it was also being used for all of the management reporting. Upon closer investigation I found that the database which was 210 GB in size only contained 30 GB of data. Yes you read that right there was 180 GB of Indexes on 30 GB of data.
Assuming that you will receive great performance if every report query has its own index specific will not guarantee a good outcome. In this case the variations of NCI’s provided some much overhead from an index maintenance as well as the increased IO on the OLTP database was utilising excessive resources.
Indexes are a good thing but don’t assume every index created is good for your system. Know your system and ensure the potential impact on your environment does not out way the benefits you hope to gain.