Thursday, July 29, 2010

The DW-DM-BI acronym and terminology guide

The BI field is full of acronyms - many are common to the industry itself, while others are specific to the Microsoft solution (incidentally, Microsoft used to be famous for their TLAs - Three Letter Acronyms - but lately we are seeing some increased usage of four letter acronyms).

AMOAnalysis Management Objects
BIBusiness Intelligence
BIDSBusiness Intelligence Development Studio
CubeA logical object that stores dimensions and facts and provides a multidimensional view of the data (often used interchangeably with UDM)
DMData Mining; also used for Data Mart
DMXData Mining Extensions (language)
DTSData Transformation Services (generally refers to legacy SQL Server 2000 packages)
DWData Warehousing, or a Data Warehouse
ETLExtract, Transform, Load
KPIKey Performance Indicator
MDXMultidimensional Expressions (language)
MOLAPMultidimensional OLAP
OLAPOnline Analytical Processing (also see HOLAP, MOLAP, ROLAP)
OLTPOnline Transaction Processing
ROLAPRelational OLAP
SSASSQL Server Analysis Services
SSISSQL Server Integration Services
SSRSSQL Server Reporting Services
UDMUnified Dimensional Model (often used interchangeably with cube)
XMLAXML for Analysis

Wednesday, July 28, 2010

Use a Data Warehouse for Reporting

Businesses of all types and sizes use database systems, called Online Transaction Processing (OLTP) systems, to store and manage their transactions. These OLTP systems, when well designed, store data in relational tables in a highly normalized format, that is, tables are linked to each other so as to:
  • Avoid duplication/redundancy of data
  • Minimize space requirements
  • Enable fast retrieval and updates of these transactions
OLTP systems are very efficient at storing millions of transactions. However, when building reports that need to aggregate this data (for example, calculating sum, average, percentage etc), things start to slow down. The problem becomes more pronounced as we start to filter on various criteria, such as "Total $ amount of sales for last year", or "Total $ amount of sales of Mountain Bikes in California during Q3 of last year".

This is because aggregation is being done on the fly for each query, and combined with the additional filters (WHERE clauses), quickly starts eating up our SQL Server resources. Not only are the reports slow, but other business users experience significant system slowdowns as well.

A Data Warehouse (DW) on the other hand, which is part of an Online Analytical Processing (OLAP) system, is built for one purpose alone - fast, efficient reporting.

In a DW system, data from the OLTP system will be read, de-normalized, and (typically) stored in a Star Schema, as a set of Dimension and Fact tables. This is often done using SQL Server Integration Services (SSIS). From these tables, tools within SQL Server Analysis Services (SSAS) can be used to build cubes, which contain not only our data but also its pre-calculated aggregates based upon definitions that we specify within SSAS. This results in many advantages:
  • Super-fast reporting
  • No impact on the OLTP system
  • Data Mining (DM) and Business Intelligence (BI) solutions can be readily built
The DW is not synchronized in real-time with the OLTP system. It is loaded at regular intervals, which may be as varied as once a month or several times a day, depending on the business requirements. SSIS and SSAS enable us to create packages that can be scheduled to run automatically (say, nightly), simplifying the time-consuming activities of loading of the Data Warehouse and building the Cubes.