4/20/13

Indexes in Teradata

Teradata as we all know is a Relational Database Management System (RDBMS) for the world's largest commercial databases. It is the market leader in  Data Warehousing. Its architecture is so designed that it takes advantage of the concept of parallelism.

Almost every Database has what we know as KEY i.e. Primary Key, Foreign Key etc defined while implementing the DDL of a table. But in Teradata we don't have any concept of Keys instead what we have is INDEX. Indexes are conceptually different from keys. PRIMARY KEY is relational modelling term that defines, in the logical model whereas the PRIMARY INDEX is a physical database implementation term that defines the actual columns used to distribute and access rows in a table.

What is an Index?


In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraint for a table. The Teradata RDBMS support five types of indexes:

* Unique Primary Index (UPI)
* Unique Secondary Index (USI)
* Non-Unique Primary Index (NUPI)
* Non-Unique Secondary Index (NUSI)
* Join Index

A typical index contains two fields:

* A Value
* A Pointer to instances of the value in a data table

Teradata uses the index and hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash.

The following rules apply to the indexes:

* As mentioned above index help in distribution and retrieval of rows of a table. It can be made up of one or more than one column.
* A table can have a number of indexes, including 1 Primary Index and more than 32 Secondary Index.
* An index defined for a table can be primary or secondary, and can be unique or non-unique. Each type of index affects the performance of the system and also the data integrity.
* An index is generally defined on columns of a table which are generally used in WHERE or join condition.
* Index helps in enforcing PRIMARY KEY and UNIQUE constraint.


Primary Index



Primary Index determines the distribution of the rows on the disks controlled AMPs. A primary index in Teradata RDBMS is required for row distribution and storage. When a row is inserted, its hash code is calculated using a hashing algorithm and depending on the whether the index is UNIQUE or Non-UNIQUE it is checked for duplication of those index. Rows having the same hash value are stored on the AMP.




Secondary Index



In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined to a table. Comparing to primary indexes, Secondary indexes allow an alternate path to access the rows in the table which is less frequently used. Basically a Secondary Index is a subtable that is stored in all AMPs separately from the primary table. The subtable made and maintained by the system contains the following information:

* RowIDs of the subtable rows
* Base table index column values
* RowIDs of the base table rows

Below diagram shows how the secondary index subtable on each AMP is associated with the base table by the RowIDs:


Join Index



A join index is nothing but an indexing structure which contains columns from multiple tables. Rather than having to join individual tables each time the join  operation is needed, the query can be resolved via a join index and in most cases dramatically improves performance.

SYNTAX:

CREATE JOIN INDEX [<database-name>.]<index_name>
[[NO] FALLBACK]
AS SELECT
[<columns>]
[SUM numeric-expression]
[COUNT coulmn-expression]
[EXTRACT year|month from date-expressino]
FROM [<database-name(s).]<table-names>]
[WHERE <search-condition>]
[GROUP BY <coulmn-name>]
[ORDER BY <coulmn-name>]
PRIMARY INDEX(<column-name>)
[index <column-name> ORDER by HASH|VALUES]
;