SqlTutorial-1 : Lesson-9 : Class-1
Microsoft SQL Server Training:
Free Online-Learning Classes (OLC) for T-SQL
Overview of Sql-INDEX Operations and Optimizations


Index is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure (Key-Pointer) that associated with a Table (or View) in order to increases performance during retrieving the data from that Table (or View).
D atabase-Table is good enough to store business data but it is a lack of meta-information needed to search its data more quickly and efficiently. This behavior of table might not affect Search-Performance and might be ignorable if either the table contains minimal set of records, or you always querying table without WHERE-filters or both. However, in real-world databases none of the scenario is supportive. There could be thousands of records in a table, and you need to extract the information with complex queries having various WHERE-filters including Sub-queries WHERE-filters too. This will take a long time and causes a significant performance impact. Therefore, Indexes are come into the picture. You can create about 250 indexes over a single table based on columns which frequently used in WHERE-filters of queries. These columns called Search-Keys.

Microsoft SQL Server Training Online Learning Classes INDEX Overview,  Optimizations , Clustered and Non-clustered Indexes
A table without any Indexes stores the data into an un-organized memory structure called Heap. When you execute a query against heap even using WHERE-filter, SQL-Server searches entire table from 1st row to last-row sequentially. This searching process is called Table-Scan where as the search over against an Indexed Table called Index Lookups. The queries that use Indexed columns in WHERE-filter are called Covering Queries.

Indexes in database are much similar to Indexes in the Book. They describe book contents in two ways: Topic-wise (called Book-Contents put at front of the book) and Keyword-wise (called Book-Index put at back of book): For example, if you want to find out the pages of book where particular topic is discussed, then refer the Index which contains topic name in alphabetic order along with page Numbers. A book-page can contain multiple topics or a topic can spread across multiple book-pages. Similarly a databases-page (a.k.a., bucket) can contain multiple records or a single record can spread across multiple pages if too long, the maximum size of database-Page: 8kb and maximum-record size is based on data types of column e.g., only a single xml-type column can contain 2Gb of data.

In databases there are two types of Indexes: Clustered and Non-Clustered Indexes. The clustered index is similar to Topic-Wise contents of the book in which topics are sorted sequentially in order of the page numbers; here Table-rows are sorted in Clustered Index-Key order, typically Primary Key order. A book can organize the pages in one sequential order similarly a table organizes their records according to a Clustered-Index search-key order. Hence, a table can have at-most one Clustered Index.

The Database-Index also contains Search-keys along with Pages where records are available. Database-page is a collection of records access as whole for retrieval and persisting data to and from databases.

Database-Index contains Keys [column(s) of Table or View] and these Keys stored in a tree-like structure called Balanced-Tree (B+ -Trees). This B-Tree organization of Keys gives two advantages:
  1. Key-by-Key search instead of Record-by-record search. 
  2. Dynamic search instead of linear or sequentially search.
The primary advantage as well as disadvantage of Indexes is performance. Because, one-hand indexes speed ups data retrieval from database, but on other-hand they slow-downs INSERT, DELETE and UPDATE operations since these operations should be reflect on both table as well as Index-structures. By default SQL server creates Clustered Index on primary key, this is the reason why you can’t modify he primary key-value. The reasoning of additional index is depends on 2-factors:
  1. Frequency of columns that uses in SELECT-WHERE-clauses: 
  2. Frequency of columns that modified or deleted UPDATE-WHERE and DELETE-WHERE.
Note that frequency of large data type columns such as Binary-data type, Xml data type columns might be zero. Therefore, it is not advisable to create index on such types of columns.

The key-role of DBA is to determine and design proper Index that should optimize for your database. DBA needs to balance the between frequency of Data-Retrieval and Data-Modifications.


INDEX
Overview
  1. What are indexes?What are their advantages, disadvantage and Restrictions
  2. Index Aspects: Cans and Can'ts of Indexes
  3. Types: Clustered Vs Non-Clustered
  4. How Indexes help PERFORMANCE Tuning and When should Indexes  avoided?
  5. FAQs


I. Overview

1. INDEXes; Avantages, Disadvantage and Restrictions

a) Definition

Index is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure (Key-Pointer) that associated with a Table (or View) in order to increases performance during retrieving the data from that Table (or View).

b) Advantages and disadvantages

The main advantage as well as dis-advantage of Index mechanism is PERFORMANCE. Because, it speed ups the data retrieval process but slow-downs the data modifications process performance. Updating a table with Indexes takes more time than updating without Index, because SQL server needs an additional updates to INDEX entry also. Therefore, before creating explicit index you have to balance the frequency of Data-Retrieval and Data-Modifications on Search-key.

Advantages:

Indexes gives two advantages:
  1. Performance optimization i.e., to speed up data search in database. 
  2. To Enforcing uniqueness on non-key columns. i.e., A column which is neither a PRIMARY key nor a UNIQUE key, can be restricted as it can contain only distinct values.

Disadvantages:

  1. Performance: Adding indexes (both clustered and non-clustered) slow down insert, delete, update operations as data has to be updated in indexes and indexes possibly re-structured.
  2. Extra Memory consumption: The amount of space that they require will depend on the columns in the index, and the number of rows in the table.

c) Restrictions

  • A table can have maximum 250 index, in which 1-clustered and 249 non-clustered.
  • Up to 16-columns can be combined into a single composite index key.
  • You can INCLUDE non-key columns from 1 - 1,023 columns to non-clustered index.
  • The max size of combined/single column is 900 bytes(not 9kb) and size(page)=8060 bytes(8kb).
  • You cannot create index on large object (LOB) data type columns such as ntext, text, varchar, binary etc. However, you can INCLUDE these columns to non-clustered indexes. For indexing on Text, ntext, and image etc use FULL TEXT INDEXES instead NORMAL INDEXES.


2. Index Aspects: Cans and Can'ts of Indexes

A) Basic Aspects of Indexes :

  • When an Index is created, it first sorts the records of tables and then assigns ROWID to each row.
  • Indexes also enforces UNIQUE constraints, i.e., without using UNIQUE constraints you can make a column as it can contains only distinct values. This can be done by creating UNQIUE-index on that column.
  • Indexes are persistent Database Objects. Therefore, they allocate memory and stored by a global visible name.
  • Indexes are independent and isolated to their associated tables or views. Therefore, you can create additional Indexes on table and you can alter, rebuild, reorganize and even delete Indexes without affecting base tables. But when you delete base table or view, all associated indexes get deleted automatically.
  • SQL-Server uses Indexes with all DML operations that use WHERE-Filter e.g., SELECT-FROM-WHERE, UPDATE-SET-WHERE and also DELETE-FROM-WHERE statements.
  • SQL-server perform full-table-scan instead of Index-lookups when DML operations not uses WHERE-filter SELECT-FROM, UPDATE-SET, DELETE-FROM. 
  • Updating a table with Indexes takes more time than updating without Index, because SQL server needs an additional updates to INDEX entry also. Therefore, before creating explicit index you have to balance the frequency of Data-Retrieval and Data-Modifications on Search-key.
  • You can create at most 250-indexes on a view or table. 1-Clustered and 249-Non-Clustered.
  • Index is organized in tree-like structure called B-Trees which always maintains same depth to all pages.

B) What INDEX-CANs:

  • Indexes are created against Table as well as against Views and Global Temporary Tables.
  • An index can be created on empty as well as non-empty table.
  • You can create duplicate Indexes, i.e., two Indexes on same column. This is not recommended and must be analyzed and avoided as this regrets performance significantly.
  • Indexes (both clustered and Non-Clustered) can be created on either single column or composite columns even having Xml columns or binary etc data types but not recommended. However, Sql Server introduces a new type of indexing called Xml-Indexing for xml type-of columns.
  • You can create clustered and non-clustered index on same single column simultaneously. In this case, Non-clustered index becomes useless and SQL server always uses clustered index instead.
  • You can create Clustered Index and Primary Key on different columns. In this case Primary Key uses Non-clustered Indexes, but the condition is Clustered
  • Index must be created before adding Primary key to table.
  • You can define Index of Table in different databases. For this use fully-qualified database name while defining Indexes:

    [database_name.[ schema_name ].| schema_name. ] table_or_view_name

C) What INDEXes-CAN'Ts :

  • Two indexes with same name even against different tables are not allowed.
  • It is not possible that a table having primary key but not Clustered Index but reverse is possible i.e., a table can exists with Clustered Index but without primary key.
  • A single Index cannot be associated with multiple tables. For example, you cannot an Index with composite columns taking from different tables.
  • You cannot create index on Non-key columns i.e., the Non-primary-Key or Non-Unique-Key column are invalid for creating indexes. However, you can Include those columns.
  • For a view, you cannot create Index on View columns defined by Expression or fx(n).

3.INDEX-Types : Clustered Vs Non-Clustered

There are two tyeps of INDEXes : Clustered and Non-Clustered.
By default Clustered Index is created when you add Primary Key to table  and Non-clustered Indexes are created when you add Unique-keys to tables.

A. Clustered INDEX

  1. Clustered index organizes the table records in sequential order based on its key. (Typically, Index key is primary key, however DBA can select other unique-keys for this while defining clustered index).
    You can verify this as follows:
    • Suppose, If you want insert the records in descending order (default ASC); then
      create Clustered index on primary key with DESC order.
    • If you don’t want the order and need to insert the records in ad-hoc fashion,
      just don’t create Clustered index and create table without Primary key.
  2. SQL Server creates Clustered INDEX automatically if not exists, when you add Primary Key to Table.
  3. Bottom, or leaf level of the clustered index structure contains the actual data rows of the table
  4. A table can contain at most one Clustered Index.
  5. Clustered indexes do not add included columns. Because, it implicitly adds all columns of the base table. Clustered indexes are defined within the table itself not outside of the table.

B. Non-Clustered INDEX

  1. A non-clustered index does not enforces order of table rows, instead it adds a additional pointers to actual data.
  2. SQL server 2005 allows up to 249 non-clustered index per table. Whereas SQL Server 2008 allows up to 999 non-clustered indexes per table.
  3. Each non-clustered index can contain upto 1023 included columns.
  4. Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself.
  5. Foreign keys should be non-clustered.
  6. It is similar to an index in the back of a book.

C. Clustered Vs Non-Clustered

The main difference between Clustered and Non-Clustered Indexes are based on following:
  1. Physical Sorting: The clustered index sorts data sequentially based on index-key whereas Non-Clustered Index does not instead it refers the data according to secondary-keys.
  2. Location : The clustered must be in same file-group/page where associated table avialable whereas Non-clustered index can exists in different file-group/page even in different databases from its associated Table.
  3. Existance: A table can contain at most one Clustered Index whereas Non-clustered indexes can be 249/SQLServer-2005 and 999/SQLServer-2008
  4. Performance and priority: Clustered index performs faster than non-clustered index, therefore if both clustered and non-clustered index are created on same column(or key), then SQL Server uses Clustered Index
  5. Data Availability:  As far as concerned to their structures, both clustered index and non-clustered indexes organizes their search-key values in B-Trees, but Clustered Index contain Pages in leaf nodes whereas Non-clustered Index contains references to pages instead of actual data.
  6. INCLUDED-columns: Clustered Index does not contains INCLUDED columns whereas Non-clsutered can contain in order to add Pages to its leaf-nodes directly.


4. How Indexes help PERFORMANCE Tuning

The main advantage as well as disadvantage of Indexing is Performance. There are two reasons that degrades the performance when Indexes are applied?
  • Frequent DML operations: , particularly Insert and delete operation. These operations cause to re-organize the tree, and updating nearly 249 index entries. 
  • Poor designing of indexes: It includes
    1. Creating indexes on frequently modifying columns. (It causes modifications to indexes).
    2. Defining un-necessary non-clustered indexes. (All are loaded and used un-necessarily) As a thumb-rule, create non-clustered indexes on columns that are frequently referenced in WHERE clause, JOIN and GROUP BY clauses.
    3. Creating index on large keys specially Char keys (As Each character get compared) to keep index keys Short and use possibly integer type columns)
      Clustered index is desired when you extract large result sets and Non-clustered index is used to access small amount of data.

A: Use Clustered index when

  1. Choose the column which is infrequently modified for clustered index.Typically primary key. 
  2. Sequential access to a table. 
  3. Range of data using BETWEEN, >= & =<, >&< in WHERE clause.
  4. Frequently joining with other tables based on a particular column. 
  5. ORDER BY clauses or GROUP BY clauses.
  6. Clustered indexes are not advised to be used for columns that are updated frequently

B: Use Non-clustered Index when

  1. Queries that do not return large result sets 
  2. Columns that frequently in WHERE clause with equality condition.
  3. If a table is updated often and queried less so, you should carefully consider the benefits and drawbacks of adding such an index.

C: When should indexes be avoided?

Some times indexes creates performance overheads. The following are situations when we don't want indexes no more.
  1. If your tables is small contains and contains minal no. of records
  2. If your table is frequently modified by large updates or insert operations.
  3. If your column NULL-cardinality is high i.e., contains multiple NULL values then Indexes not required.

5. INDEX - FAQs

  1. What are Indexes? How they are different from Tables.

    Index is a performance optimization technique that speeds up the data retrieval process. The following are some difference between Tables and Indexes:
    1. Tables contains actual records whereas Indexes contains Tables meta-information in order to speed up data-retrieval
    2. A Table can have multiple INDEXes but an Index can associated with one and only one Table
    3. Deleting of Tables deletes all associated Indexes whereas reverse is not
  2. What Primary-Indexes (Clustered) and Secondary Indexes(Non-Clustered)

    See Clustered and Non-clustered indexes and difference between them as told in above section.
  3. How many Indexes can a table contain

    SQL-Server 2005 supports 250 indexes; 1-Clustered and 249-Non-Clustered. But SQL-Server 2008 supports 1000 Indexes; 1-Clustered and 999-Non-Clustered.
  4. What are Auto-generated Indexes

    The clustered Index on Primary-Key and Non-Clustered Indexes over Unique are autogenerated Indexes.
  5. Can we create primary-key and Clustered Index on different columns?

    Yes, we can create primary-key and clustered Index on different columns. For this you have to Create primary key with NON-CLUSTERED specification.

    Example

    CREATE TABLE Table1 (
         Id INT NOT NULL PRIMARY KEY NONCLUSTERED,
         Col1 varchar NOT NULL
    )
  6. Main advantage of Index

    Performance tuning; See advantages and disadvantages section for details
  7. How Indexes are physically organized?What are B+ trees?

    Indexe-keys are physically organized as a tree-like structrue called B-Trees? B-Trees stands for Balanced-Tree, that maintains same depth to all its leaf nodes.
  8. What is fill factor? How can we minimize the index-tree re-organization

    Fill factor is the percentage of additional memory allocated to leaf-nodes of Index B-Trees. It is greate measure to avoid index re-rebuilt or re-organization while INSERTs and DELETEs performance. At down-side it allocates additional memory for each leaf-nodes.
  9. What is Heap, Table-Scan and Index-lookups.

    Heaps:
    Ad-hoc Memory allocation for table data which is un-organized and un-sorted. Typically the memory location for Tables havingno Clustered Index is called heap. Heap needs Table scan for data access.
    Table Scan: Process of searching through every row in the table (i.e., every page and records) is said to be Table scan. SQL Server performs table scan on in two situations:
    1. Heaps i.e., there are no Clustered-indexes on table.
    2. Your query contains no WHERE-filter e.g., SELECT * FROM TableName;
    Index Lookups: Index Lookups is the alternative process to Table Scan; in which either Clustered or Non-clustered index is used to search the desired records. The queries with WHERE conditions need Index Lookups to speed up the search process. Indexing allows efficient data lookups.
  10. What are Index-Coverage and Covering Query

    Index Coverage: A covering index is a special case where the non-clustered index itself contains required data field(s) and can return the data. This can be accomplished by adding required INCLUDED columns.

    Example

    CREATE INDEX Indx1_Emp ON Employee(ID) INCLUDE Employee(name)
    // The above index is desired for following query.
    Select name from Employee where Id=101;
    // But the above index is un-usable for query like
    Select * from Employee where Id=101;

    Covering query: A query that uses columns on which indexes are created. This scenario results in superior query performance.

1 comment: