Sunday, February 24, 2013

SQL 2012 - Columnstore Indexes


Columnstore indexes in the SQL Server Database Engine can be used to significantly speed-up the processing time of common data warehousing queries. Typical data warehousing workloads involve summarizing large amounts of data. The techniques typically used in data warehousing and decision support systems to improve performance are pre-computed summary tables, indexed views, OLAP cubes, and so on. Although these can greatly speed up query processing, these techniques can be inflexible, difficult to maintain, and must be designed specifically for each query problem.
For example, consider a fact table F1 with dimension key columns dk1 and dk2. Let M be an aggregate function such as SUM. Instead of calculating M over column dk1 every time that a query referring to M(dk1) is run, a summary table F2(dk1, M) can be created and used so the result can be pre-computed and the query can execute faster. However, if a new query referring to M(dk2) is needed, a new summary table F3(dk2, M) with this information has to be created. When the number of columns in a table increases and with many possible functions, this approach becomes difficult to maintain and does not easily cover all the queries that are needed.
This overhead can be significant for the user. By using SQL Server columnstore indexes, users can reduce the overhead of the other solutions. Columnstore indexes also enable queries to compute the results quickly so that pre-computation is not required.
The key characteristics of SQL Server column store technology are as follows:
  • Columnar data format – Unlike the traditional row based organization of data (called rowstore format), in columnar database systems such as SQL Server with columnstore indexes, data is grouped and stored one column at a time. SQL Server query processing can take advantage of the new data layout and significantly improve query execution time.
  • Faster query results – Columnstore indexes can produce faster results for the following reasons:
    • Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.
    • Columns are heavily compressed. This reduces the number of bytes that must be read and moved.
    • Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.
    • Advanced query execution technology processes chunks of columns called batches in a streamlined manner, reducing CPU usage.
  • Key columns – There is no concept of key columns in a columnstore index so the limitation on the number of key columns in an index (16) does not apply to columnstore indexes.
  • Clustered index key – If a base table is a clustered index, all columns in the clustering key must be present in the nonclustered columnstore index. If a column in the clustering key is not listed in the create index statement, it will be added to the columnstore index automatically.
  • Partitioning – Columnstore indexes works with table partitioning. No change to the table partitioning syntax is required. A columnstore index on a partitioned table must be partition-aligned with the base table. Therefore a nonclustered columnstore index can only be created on a partitioned table if the partitioning column is one of the columns in the columnstore index.
  • Record Size – The index key record size limitation of 900 bytes also does not apply to columnstore indexes.
  • Query processing – Along with the columnstore index, SQL Server introduces batch processing to take advantage of the columnar orientation of the data. The columnstore structure and batch processing both contribute to performance gains, but investigating performance issues might be more complex than if only one factor were involved.
  • Table cannot be updated – For SQL Server 2012, a table with a columnstore index cannot be updated.

  •  
Syntax of ColumnStore Index


CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name
    ON <object> ( column  [ ,...n ] )
    [ WITH ( <column_index_option> [ ,...n ] ) ]
    [ ON {
           { partition_scheme_name ( column_name ) }
           | filegroup_name
           | "default"
         }
    ]
[ ; ]

<object> ::=
{
    [database_name. [schema_name ] . | schema_name . ]
     table_name
{

<column_index_option> ::=
{
      DROP_EXISTING = { ON | OFF }
    | MAXDOP = max_degree_of_parallelism
}
  

Examples:

Creating a simple nonclustered index

CREATE TABLE SimpleTable
(ProductKey [int] NOT NULL,
OrderDateKey [int] NOT NULL,
DueDateKey [int] NOT NULL,
ShipDateKey [int] NOT NULL);
GO
CREATE CLUSTERED INDEX cl_simple ON SimpleTable (ProductKey);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_simple
ON SimpleTable
(OrderDateKey, DueDateKey, ShipDateKey);
GO

Creating a simple nonclustered index using all options

CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_simple
ON SimpleTable
(OrderDateKey, DueDateKey, ShipDateKey)
WITH (DROP_EXISTING =  ON,
    MAXDOP = 2)
ON "default"
GO

More about columnstore Indexes....

Indexes can be created on a temporary table. When the table is dropped or the session ends, the indexes are dropped.
The common business data types can be included in a columnstore index. The following data types can be included in a columnstore index.
  • char and varchar
  • nchar and nvarchar (except varchar(max) and nvarchar(max))
  • decimal (and numeric) (Except with precision greater than 18 digits.)
  • int, bigint, smallint, and tinyint
  • float (and real)
  • bit
  • money and smallmoney
  • All date and time data types (except datetimeoffset with scale greater than 2)
The following data types cannot be included in a columnstore index.
  • binary and varbinary
  • ntext, text, and image
  • varchar(max) and nvarchar(max)
  • uniqueidentifier
  • rowversion (and timestamp)
  • sql_variant
  • decimal (and numeric) with precision greater than 18 digits
  • datetimeoffset with scale greater than 2
  • CLR types (hierarchyid and spatial types)
  • xml
Basic Restrictions
A columnstore index:
  • Cannot have more than 1024 columns.
  • Cannot be clustered. Only nonclustered columnstore indexes are available.
  • Cannot be a unique index.
  • Cannot be created on a view or indexed view.
  • Cannot include a sparse column.
  • Cannot act as a primary key or a foreign key.
  • Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead. (You can use ALTER INDEX to disable and rebuild a columnstore index.)
  • Cannot be created by with the INCLUDE keyword.
  • Cannot include the ASC or DESC keywords for sorting the index. Columnstore indexes are ordered according to the compression algorithms. Sorting would eliminate many of the performance benefits.
Columnstore indexes cannot be combined with the following features:
  • Page and row compression, and vardecimal storage format (A columnstore index is already compressed in a different format.)
  • Replication
  • Change tracking
  • Change data capture
  • Filestream

No comments:

Post a Comment