4/20/13

Indexes in Teradata

Teradata as we all know is a Relational Database Management System (RDBMS) for the world's largest commercial databases. It is the market leader in  Data Warehousing. Its architecture is so designed that it takes advantage of the concept of parallelism.

Almost every Database has what we know as KEY i.e. Primary Key, Foreign Key etc defined while implementing the DDL of a table. But in Teradata we don't have any concept of Keys instead what we have is INDEX. Indexes are conceptually different from keys. PRIMARY KEY is relational modelling term that defines, in the logical model whereas the PRIMARY INDEX is a physical database implementation term that defines the actual columns used to distribute and access rows in a table.

What is an Index?


In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraint for a table. The Teradata RDBMS support five types of indexes:

* Unique Primary Index (UPI)
* Unique Secondary Index (USI)
* Non-Unique Primary Index (NUPI)
* Non-Unique Secondary Index (NUSI)
* Join Index

A typical index contains two fields:

* A Value
* A Pointer to instances of the value in a data table

Teradata uses the index and hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash.

The following rules apply to the indexes:

* As mentioned above index help in distribution and retrieval of rows of a table. It can be made up of one or more than one column.
* A table can have a number of indexes, including 1 Primary Index and more than 32 Secondary Index.
* An index defined for a table can be primary or secondary, and can be unique or non-unique. Each type of index affects the performance of the system and also the data integrity.
* An index is generally defined on columns of a table which are generally used in WHERE or join condition.
* Index helps in enforcing PRIMARY KEY and UNIQUE constraint.


Primary Index



Primary Index determines the distribution of the rows on the disks controlled AMPs. A primary index in Teradata RDBMS is required for row distribution and storage. When a row is inserted, its hash code is calculated using a hashing algorithm and depending on the whether the index is UNIQUE or Non-UNIQUE it is checked for duplication of those index. Rows having the same hash value are stored on the AMP.




Secondary Index



In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined to a table. Comparing to primary indexes, Secondary indexes allow an alternate path to access the rows in the table which is less frequently used. Basically a Secondary Index is a subtable that is stored in all AMPs separately from the primary table. The subtable made and maintained by the system contains the following information:

* RowIDs of the subtable rows
* Base table index column values
* RowIDs of the base table rows

Below diagram shows how the secondary index subtable on each AMP is associated with the base table by the RowIDs:


Join Index



A join index is nothing but an indexing structure which contains columns from multiple tables. Rather than having to join individual tables each time the join  operation is needed, the query can be resolved via a join index and in most cases dramatically improves performance.

SYNTAX:

CREATE JOIN INDEX [<database-name>.]<index_name>
[[NO] FALLBACK]
AS SELECT
[<columns>]
[SUM numeric-expression]
[COUNT coulmn-expression]
[EXTRACT year|month from date-expressino]
FROM [<database-name(s).]<table-names>]
[WHERE <search-condition>]
[GROUP BY <coulmn-name>]
[ORDER BY <coulmn-name>]
PRIMARY INDEX(<column-name>)
[index <column-name> ORDER by HASH|VALUES]
;


SHARE THIS POST:

Related Posts:

  • SQL Data Definition Language (DDL) Tutorial After writing a post on Data Manipulation Language (DML), here is another tutorial in SQL series on DDL i.e. Data Definition Language. DDL (Data Definition Language) statements are used to build and modify the structur… Read More
  • SQL Data Control Language (DCL) Tutorial After doing tutorials on DML (Data Manipulation Language) and DDL (Data Definition Language), we will be learning about one more language used in SQL called the Data Control Language (DCL). As we saw in our previous tutor… Read More
  • GROUP BY vs DISTINCT in SQL This question has to come my mind many times and also I have been asked the same question many time. That what should be used GROUP BY or DISTINCT, as both perform the same thing. So, after researching and using both stat… Read More
  • SQL Tutorial on Data Manipulation Language (DML) After writing a tutorial on SQL Baiscs, long back, here I am writing again on SQL, but this time on DML i.e. Data Manipulation Language. By the full form of DML we understand that with the help of this we will be able… Read More
  • Variables and Constants in PL/SQL This is out first of many tutorial in the PL/SQL series. Today we will be discussing about variables and coonstants in PL/SQL. PL/SQL stands for Procedural Language to Structured Query Language. We can use PL/SQL in the … Read More

1 comment:

  1. Thank you for graet article, this give good explanation ablot the index. very formely created the type of the index, thank you so much, keep sharing such good information.

    ReplyDelete