Pages

Wednesday, July 28, 2010

Use a Data Warehouse for Reporting

Businesses of all types and sizes use database systems, called Online Transaction Processing (OLTP) systems, to store and manage their transactions. These OLTP systems, when well designed, store data in relational tables in a highly normalized format, that is, tables are linked to each other so as to:
  • Avoid duplication/redundancy of data
  • Minimize space requirements
  • Enable fast retrieval and updates of these transactions
OLTP systems are very efficient at storing millions of transactions. However, when building reports that need to aggregate this data (for example, calculating sum, average, percentage etc), things start to slow down. The problem becomes more pronounced as we start to filter on various criteria, such as "Total $ amount of sales for last year", or "Total $ amount of sales of Mountain Bikes in California during Q3 of last year".

This is because aggregation is being done on the fly for each query, and combined with the additional filters (WHERE clauses), quickly starts eating up our SQL Server resources. Not only are the reports slow, but other business users experience significant system slowdowns as well.

A Data Warehouse (DW) on the other hand, which is part of an Online Analytical Processing (OLAP) system, is built for one purpose alone - fast, efficient reporting.

In a DW system, data from the OLTP system will be read, de-normalized, and (typically) stored in a Star Schema, as a set of Dimension and Fact tables. This is often done using SQL Server Integration Services (SSIS). From these tables, tools within SQL Server Analysis Services (SSAS) can be used to build cubes, which contain not only our data but also its pre-calculated aggregates based upon definitions that we specify within SSAS. This results in many advantages:
  • Super-fast reporting
  • No impact on the OLTP system
  • Data Mining (DM) and Business Intelligence (BI) solutions can be readily built
The DW is not synchronized in real-time with the OLTP system. It is loaded at regular intervals, which may be as varied as once a month or several times a day, depending on the business requirements. SSIS and SSAS enable us to create packages that can be scheduled to run automatically (say, nightly), simplifying the time-consuming activities of loading of the Data Warehouse and building the Cubes.

No comments:

Post a Comment