Limitations on ColumnStore Indexes

ColumnStore index is a new type of index available in SQL 2012 and has been enhanced in SQL 2014 version. ColumnStore index stores the data column wise instead of a regular row based B tree format and hence, it has better ability to compresses the data. When it was first released in SQL 2012, it has several limitations and some of those have been addressed in later versions of SQL(SQL 2014).

Quick pointers on limitations on ColumnStore Indexes in SQL 2012 and 2014:

  • NonClustered columnstore does not require the table to have a clustered index.
  • In SQL 2014, Clustered ColumnStore index is the only index that can exist on the table.
  • In SQL 2014, Clustered ColumnStore Index is updateable. Data can be inserted,updated and deleted from the table.
  • In SQL 2012/14 – NonClustered ColumnStore Index are NOT updateable. One solution is, to disable the columnstore Index and do the data load and then re enable the columnstore index.
  • Varchar(max),nvarchar(max),xml, ntext,text,image etc are not suitable datatypes for Columnstore index.
  • Foreign Key, Unique constraints cannot be defined.
  • In SQL 2012, there is no clustered columnstore option only NonClustered ColumnStore index.
  • In SQL 2014, we can create clustered columnstore or NonClustered ColumnStore index. However, if the table has clustered columnstore index, we cannot create any other index.

I will add more points as I come across them.

Advertisement

Index Selectivity – Left Based

There are several types of indexes available but I will be referring to only clustered and non clustered indexes.

Clustered Index is a type of index where the logical and physical order of the table records are in the same order.

Non Clustered Index is a type of index where the logical and physical order of the table records are NOT in the same order. This index typically consists of specific columns defined in the index creation unlike clustered index which applies to the entire base table.

You can refer to BOL link here

There can only be one clustered index and up to 249 non clustered indexes on the table. All non clustered indexes will also include either cluster index key or ROWID depending on whether the table is clustered or heap that act as pointer back to base table. This pointer helps in creating a “look up” if additional columns are need to be fetched from the base table by the query.

The basic purpose of Indexes is to improve the performance of SQL Query and hence the design of indexes is very important as selectivity of index depends on how well the index and query logic is defined.

Query optimizer uses indexes and performs either scan or seek operation on them to fetch the data.Generally, seek is better for fetching single rows and small subset of rows,scan is better for pulling large set of rows.

While it is possible to tell the query optimizer to either use seek or scan using query hints, it is generally best practice to leave it up to the query optimizer and better investigate further to see why the optimizer is using not-so optimal plan(parameter sniffing,statistics update etc). It is also important to look the index definitions and the way queries are begin written.

I am using Adventure Works 2012 for this example,and you can download it from here.In AdventureWorks2012 database,there is a table called Person.Person and it has non clustered non unique index on LastName,FirstName,MiddleName.

Consider the below example:

Select * from Person.Person where FirstName='Gail' and MiddleName='A'
This performs a non clustered index scan but since the query returns only very small subset of data, an index seek would be an ideal operation here. The reason to do index scan is because the index is defined over LastName,FirstName and MiddleName but the query predicate is filtering only on FirstName and MiddleName. The Execution plan looks like this:

NC_IndexScan

Select * from Person.Person where LastName='Erickson' and MiddleName='A' This leads to a non clustered index seek which is an optimal plan.

NC_IndexSeek

So,the take away from this blog is if the left most index columns are used in the query predicate, the query optimizer comes up with an optimal plan. If the left the most column is not used in the query predicate, it will always lead to index scan whether it is optimal or not. That’s why it is important to have right indexes defined so the query optimizer can make best use of the index.