Showing posts with label sql tutorial. Show all posts
Showing posts with label sql tutorial. Show all posts

10/19/13

Primary Index Choice Criteria

We have already seen Tutorial Primary Index and we have understood how Primary Index work and how they us in maximizing performance.

Today we will be learning on how to choose Primary Index/Indexes in a given table. But before we move ahead in defining the criteria for choice of Primary Index, here is a Tip:

TIP

If you don't define any Index on a Table, then Teradata decides on its own. Its makes decision on the below points:

1. If you have any column with Primary Key Constraint in the table definition, Teradata will make column as Unique Primary Index.

2. If you have a/many columns with Unique Constraint, then Teradata will choose only the 1st column in the table definition as Unique Primary Index & others as Unique Secondary Index.

3. If you don't have column with defined as either Primary key or with Unique Constraint, then Teradata makes the 1st column in the table definition as Non Unique Primary Index.

Primary Index Choice Criteria

Now lets start with the Criteria's of Choosing a good Primary Index. Basically, there are three Primary Index Choice Criteria: Access Demographics, Distribution Demographics,  and Volatility.

Access Demographics

By Access Demographics we mean, those columns which were used by the user to access the table, i.e. columns used in the WHERE clause of the SQL Statement. So choose the column(s) which were most frequently used for access to maximize the number of one-AMP operation. We need to consider both value as well as join access

Distribution Demographics

So as we know from our previous post on Primary Indexes, More unique the index, the better the distribution. Optimizing distribution optimizes parallel-processing.

Volatilty

You must understood if you know the meaning of the word Volatile. This point means that we have to choose a column which will have a low change rate. The Primary Index should not at all be volatile because any changes in the PI will result in heavy I/O overhead i.e. as a result in change the PI has to be moved from one AMP to another. Therefore, we have to choose a column which will have stable values.

**Note**

There is a trade-off between the access & distribution demographics. The most desirable situation is to find a column which has a good access and good distribution demographics.


5/27/13

Tutorial on Primary Index in Teradata

Last tutorial was on Indexes in Teradata in which we covered Primary Index, Secondary Index and Join Index. What we saw was just a summary of all the Indexes.

This tutorial will help you in understanding Primary Index in a much clearer way. As we saw last time a Primary Index determines the distribution of the rows  on the disks controlled AMPs. A primary index in Teradata in required for row distribution and storage. 

Below image shows how the distribution takes place using the Primary Indexes: 



The PI for a table should represent the data values used by SQL to access the data for the table. Therefore, a PI for a table should be selected very carefully as it can improve or degrade the performance of SQL used for accessing that table. 

Rules for Defining a PRIMARY INDEX



Below are some rules that I have come across while identifying on which column to be defined as PI:

* The PI selected should be such that it has unique index as more the unique index more evenly the rows will be distributed by the AMPs and better will be the space utilization.
* Define the index on as few columns as you can. 
* The PI defined can have unique or non-unique values. If defined as unique then you will have to make sure that the each value you pass into the column is unique and never repeated whereas if defined non-unique you can pass duplicate values into that column.
* Flags and Junk Dimensions are never to be defined as PI as it leads to more skewness. Therefore that data in skewed and when you perform a join it can hamper the performance and lead to spool space error.

NOTE: 

If you forget to define PRIMARY INDEX while doing a CREATE TABLE, the default will be use i.e. on the following:

* PRIMARY key
* First UNIQUE constraint
* First column

Creating PRIMARY INDEX


Unique primary index in created using the (UNIQUE) PRIMARY INDEX clause in the CREATE TABLE statement. Non-unique primary index are also created in the same way just removing the (UNIQUE) clause.

NOTE : Once a PRIMARY INDEX in created on TABLE it cannot be dropped or modified, the index must be changed by recreating the TABLE.

Example: Creating a UNIQUE PRIMARY INDEX


CREATE TABLE EMPLOYEE
(EMP_ID INTEGER
, EMP_NAME CHAR(15)
, EMP_GRADE CHAR(1))
 UNIQUE PRIMARY INDEX (EMP_ID);

Example: Creating a NON-UNIQUE PRIMARY INDEX


CREATE TABLE STUDENT
(STUDENT_ID CHAR(15)
, STUDENT_NAME CHAR(20)
, DEPARTMENT CHAR(10)) 
PRIMARY INDEX (STUDENT_ID);


Data Access using PI




When a query contains a WHERE clause, and it has primary index value(s), the request in then processed by hashing the values to find the AMP where the row is stored and then the row which contains the same hash value in the RowID part of it is returned.


Primary Key vs Primary Index



I have seen many people who are working on Teradata still confuse between Primary Key and Primary Index. So here I have done a small comparison between both as both differ conceptually:


Term
Primary Key
Primary Index
Requirement Not required, unless referential integrity checks are to be performed Required
Defining Define by CREATE TABLE statement Defined by CREATE TABLE statement
Uniqueness Unique Unique or non-unique
Function Identifies a row uniquely Distributes rows
Values can be changed? No Yes
Can be null? No Yes
Related to access path? No Yes

Hope this tutorial and the above short comparison helps everyone to understand what exactly PRIMARY INDEX is and how is should be chosen and defined. Do let me know your views about the post and help me in improving it.


4/20/13

Indexes in Teradata

Teradata as we all know is a Relational Database Management System (RDBMS) for the world's largest commercial databases. It is the market leader in  Data Warehousing. Its architecture is so designed that it takes advantage of the concept of parallelism.

Almost every Database has what we know as KEY i.e. Primary Key, Foreign Key etc defined while implementing the DDL of a table. But in Teradata we don't have any concept of Keys instead what we have is INDEX. Indexes are conceptually different from keys. PRIMARY KEY is relational modelling term that defines, in the logical model whereas the PRIMARY INDEX is a physical database implementation term that defines the actual columns used to distribute and access rows in a table.

What is an Index?


In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraint for a table. The Teradata RDBMS support five types of indexes:

* Unique Primary Index (UPI)
* Unique Secondary Index (USI)
* Non-Unique Primary Index (NUPI)
* Non-Unique Secondary Index (NUSI)
* Join Index

A typical index contains two fields:

* A Value
* A Pointer to instances of the value in a data table

Teradata uses the index and hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash.

The following rules apply to the indexes:

* As mentioned above index help in distribution and retrieval of rows of a table. It can be made up of one or more than one column.
* A table can have a number of indexes, including 1 Primary Index and more than 32 Secondary Index.
* An index defined for a table can be primary or secondary, and can be unique or non-unique. Each type of index affects the performance of the system and also the data integrity.
* An index is generally defined on columns of a table which are generally used in WHERE or join condition.
* Index helps in enforcing PRIMARY KEY and UNIQUE constraint.


Primary Index



Primary Index determines the distribution of the rows on the disks controlled AMPs. A primary index in Teradata RDBMS is required for row distribution and storage. When a row is inserted, its hash code is calculated using a hashing algorithm and depending on the whether the index is UNIQUE or Non-UNIQUE it is checked for duplication of those index. Rows having the same hash value are stored on the AMP.




Secondary Index



In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined to a table. Comparing to primary indexes, Secondary indexes allow an alternate path to access the rows in the table which is less frequently used. Basically a Secondary Index is a subtable that is stored in all AMPs separately from the primary table. The subtable made and maintained by the system contains the following information:

* RowIDs of the subtable rows
* Base table index column values
* RowIDs of the base table rows

Below diagram shows how the secondary index subtable on each AMP is associated with the base table by the RowIDs:


Join Index



A join index is nothing but an indexing structure which contains columns from multiple tables. Rather than having to join individual tables each time the join  operation is needed, the query can be resolved via a join index and in most cases dramatically improves performance.

SYNTAX:

CREATE JOIN INDEX [<database-name>.]<index_name>
[[NO] FALLBACK]
AS SELECT
[<columns>]
[SUM numeric-expression]
[COUNT coulmn-expression]
[EXTRACT year|month from date-expressino]
FROM [<database-name(s).]<table-names>]
[WHERE <search-condition>]
[GROUP BY <coulmn-name>]
[ORDER BY <coulmn-name>]
PRIMARY INDEX(<column-name>)
[index <column-name> ORDER by HASH|VALUES]
;


8/19/12

Iterative Statements in PL/SQL

In the last post on PL/SQL, we had seen Conditional Statements and how they can be used. Today we will be learning about Iterative Statements in PL/SQL.


As we all know the types of Iterative statements that are there in a programming language. PL/SQL also has the same iterative statements and the functioning is also the same, but the only thing different is their syntax


So lets not waste any time and move on to the topic.



7/25/12

Conditional Statements in PL/SQL

Conditional Statements in PL/SQL are same as conditional statements available in java, c++, c or any other language. PL/SQL allows processing of data using the following conditional statements:

    * IF
    * CASE


7/6/12

GROUP BY vs DISTINCT in SQL

This question has to come my mind many times and also I have been asked the same question many time. That what should be used GROUP BY or DISTINCT, as both perform the same thing.


So, after researching and using both statements and checking the performance using some scripts from here and there, I have come to the conclusion that GROUP BY performs much better then DISTINCT.

Now before, we move on to more discussion, lets first understand what does actually both do.


6/30/12

Variables and Constants in PL/SQL

This is out first of many tutorial in the PL/SQL series. Today we will be discussing about variables and coonstants in PL/SQL.

PL/SQL stands for Procedural Language to Structured Query Language. We can use PL/SQL in the Oracle relational databases, in Oracle Server, and in client-side application development tools, such as Oracle Forms.

PL/SQL has features such as : 
* Data abstraction
* Data encapsulation
* Object-Orientation
* Exception-handling.

We will be using Oracle 9i, in our tutorials for PL/SQL.



6/26/12

SQL Data Control Language (DCL) Tutorial

After doing tutorials on DML (Data Manipulation Language) and DDL (Data Definition Language), we will be learning about one more language used in SQL called the Data Control Language (DCL).

As we saw in our previous tutorials on SQL, that with the help of DDL we can define tables, databases, views etc in SQL and with the help of DML we can manipulate those existing tables, views etc. So apart from these there are other language(s) like the DCL which help us to give or take rights to a user. 

Basically, DCL is used to create roles, permissions, and referential integrity as well it is used to control access to database by securing it.

The DCL commands that helps in achieving the above are GRANT and REVOKE.

Only Database Administrator's or owner's of the database object can provide/remove privileges on a databse object.


6/23/12

SQL Data Definition Language (DDL) Tutorial

After writing a post on Data Manipulation Language (DML), here is another tutorial in SQL series on DDL i.e. Data Definition Language.

DDL (Data Definition Language) statements are used to build and modify the structure of your tables and other objects in the database. When you execute a DDL statement, it takes effect immediately. Some of the DDL commands are CREATE, ALTER, DROP, USE. 

* Another DDL statement is used to define the referential integrity, which we will see later in the tutorial.


6/15/12

SQL Tutorial on Data Manipulation Language (DML)

After writing a tutorial on SQL Baiscs, long back, here I am writing again on SQL, but this time on DML i.e. Data Manipulation Language.

By the full form of DML we understand that with the help of this we will be able to manipulate data. Now what kind of manipulation can we do might be the next question in your mind, so SQL supports manipulation like INSERT, UPDATE, DELETE and MERGE


2/4/11

SQL : Structured Query Language (basics)

SQL stands for Structured Query Language is a database computer language used for managing data when used with application  in a Relational Database Management System (RDBMS).

There are many database software available which used the same query language. Some of them are Oracle, My SQL, DB2 etc. 
As the name suggest Structures Query language so it uses QUERY for retrieval of data from the database, and that query has a structure which is same for all databases software.