9/4/12

What is a Data Warehouse?

A very exciting and one of the hottest topic currently in the IT world is DATA WAREHOUSE. The tools which are used for the same are among some of the hot skills example INFORMATICA, TERADATA etc.





From what we know Data warehouse is nothing but a relational database which is designed for query and analysis. It usually contains history data derived from transaction data.

But as per  William Inmon, Data Warehouse definition is :

Data Warehouse is a subject-oriented, integrated, nonvolatile and time-variant collection of data in support of management's decisions.

The above definition helps us understand what exactly the term Data Warehouse means. So lets take each point into consideration and try to understand each of them.


Subject Oriented



Data warehouse helps us to analyze data. Say for example, If we want to learn more about the company's profit margin, we can have a warehouse having data about the profit in diff region, products, projects etc. So using this data we can understand and answer questions on which department made the most profit. Therefore, the ability to define a Data Warehouse with respect to a subject matter is called subject oriented.


Integrated



Integration of data is closely related to subject orientation. We should take all data from disparate sources and then push the data into a single consistent format. In this process we should also take care of the errors and faulty data. Thus when we achieve this we say it is integrated.


Nonvolatile



As we all know what nonvolatile means, and the same meaning applies here also. Once data has been pushed into Data Warehouse it should not be changed because the purpose of a  Data Warehouse  is to analyse data by comparing the historical data with the current data.


Time Variant



Data Warehouse  concentrates on changes over time where a OLTP system just dumps the historical data into archiving. This time variant capability is managed my SCD (Slowly Moving Dimension) types.

People might be still think what is OLTP ?

So OLTP stands for Online Transaction processing. The systems that are used in any booking system or in technical terms "OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing"

Now the next questions arrive what is the difference between OLTP and Data Warehouse?

There are many differences between the two, so we will list down some of the important differences :

* The most important difference of all is : OLTP is normally in 3NF (3rd Normalized Form) whereas Data Warehousing is not in 3NF. Therefore, we can also infer that OLTP won't have any kind of Data Redundancy.

* Data warehouse is used to store months and years of data to support historical analysis, whereas OLTP system store data for a few weeks or months. Therefore the sizes of the DB also has a vast difference. OLTP uses 100MB - 100GB where a Data Warehouse uses 100GB- few terabytes.

* The highly normalized structure of the OLTP helps it to optimize the operations such as UPDATE/INSERT/DELETE, where Data Warehouse has a very de-normalized structure (Star Schema) to optimize query performance.

* Data in Data Warehouse is pushed on regular basis by ETL process and end user do not update the data warehouse directly whereas in OLTP systems, end users routinely issue individual data modification statements to the database and thus the OLTP system is up to date.

These are few important differences between OLTP and Data Warehouse. We will be covering Data Warehouse Architecture in the next post related Data Warehouse.

SHARE THIS POST:

4 comments: