Index
- A Relational Database Management System (RDBMS) generally use an Index of a Table to quickly access desired row(s) in that table.
- A database index can change each time the corresponding data is changed.
- Indices are stored in additional data structures called Index Pages, (which are similar to Data Pages).
- For each indexed row there is an index entry, which is stored in an index page.
- Each index entry consists of the index key plus a pointer.
- Each index entry is miniscule in size compared to its corresponding row. Thus, the number of index entries per Index Page is significantly higher than the number of rows per Data Page. Now, this implies that the number of IO operations required to traverse the index pages is significantly lower in comparison to the number of IO operations required to traverse corresponding data pages.
- Table scan implies more IO operations, poor performance.
Index scan implies less IO operations, superior performance. - Any RDBMS uses B+ Tree data-structure for constructing an index.
- SQL Server has 2 kinds of indexes:
- Clustered Index
- Non-clustered Index
- In Microsoft's jargon, a Heap is a table without a clustered index.
- Can you create a non-clustered index on a Heap? Yes, one or more non-clustered indexes can be created on a heap.
- Most tables should have an appropriate clustered index. There are exceptional use-cases where you have good reasons to leave a table as a Heap.
- When a table is stored as a heap, individual rows are identified by its row identifier (RID). The RID is a small and efficient structure consisting of the file number, data page number, and slot on the page (FileID:PageID:SlotID).
- Indexes are automatically created when PRIMARY KEY, FOREIGN KEY and UNIQUE constraints are defined on table columns.
- Creating a Primary Key, automatically creates a Clustered Index if it doesn't exist already. If the table already has a Clustered Index, a Non-Clustered Index is created to enforce the Primary Key.
- Creation of Foreign Key and Unique constraints, automatically creates a non-clustered index
Clustered Index
- With a Clustered Index, the tables' rows are stored physically on the disk, in the same order as its clustered index.
- Clustered Index and data are stored together, in that, the leaf, level of the clustered index contains the actual data rows of the table.
- A table or view is allowed only one clustered index at a time. Why? Because the rows of a table cannot be physically ordered in more than one way.
- If you define a Primary Key for a table, a clustered index is implicitly defined, unless the table has one already.
- If a clustered index is defined for a table, the table is called a Clustered Table.
- If a clustered index is defined for a view, the view is called an Indexed View.
- Note: Creating a clustered index on a view, physically materializes the view.
- Each cluster index is unique by default.
- What happens if you were to define a clustered index on a column that has duplicate values? If a clustered index is built on a non-unique column, SQL Server will force uniqueness by adding a 4-byte identifier to the rows that have duplicate values.
Non-Clustered Index
- A non-clustered index creates an index that specifies the logical ordering of a table.
- With a non-clustered index, the physical order of the data rows is independent of their indexed order.
- A table can have more than one non-clustered indexes.
- Accessing data using non-clustered index in a Clustered Table, means traversal of the non-clustered index, followed by a jump to traverse corresponding clustered index.
- Accessing data using non-clustered index in a Non-Clustered Table (aka Heap), means traversal of the non-clustered index, followed by retrieval of a row using its RID.