Data Warehousing

by Kurt Thearling

Published on, December 2000

What is Data Warehousing?

Data warehousing takes a relatively simple idea and incorporates it into the technological underpinnings of a company. The idea is that a unified view of all data that a company collects will help improve operations. If hiring data can be combined with sales data, the idea is that it might be possible to discover and exploit patterns in the combined entity.

The most basic component in a data warehouse is a relational database. This database is the place where the data is stored. Relational databases are designed to be able to efficiently insert new data and locate existing data using a standardized query language. Given the fact that a company usually has very large amounts of data, the sizes of these databases can reach terabytes (trillions of bytes).

Underneath the database is a maze of connections and transformations connecting the data warehouse with other systems. Because data in a company is often created and stored in functionally specific systems (e.g., a payroll system), the data may need to be replicated and moved between a data warehouse and these other systems. There are a wide variety of tools that facilitate this replication and movement process.

The design of the data architecture is probably the most critical part of a data warehousing project. The key is to plan for growth and change, as opposed to trying to design the perfect system from the start. The design of the data architecture involves understanding all of the data and how different pieces are related. For example, payroll data might be related to sales data by the ID of the sales person, while the sales data might be related to customers by the customer ID. By connecting these two relationships, payroll data could be related to customers (e.g., which employees have ties to which customers).

Once the data architecture has been designed, you can then consider the kinds of reports that you are interested in. You might want to see a breakdown of employees by region, or a ranked list of customers by revenue. These kinds of reports are fairly simple. The power of a data warehouse becomes more obvious when you want to look at links between data associated with disparate parts of a organization (e.g., HR, accounts payable, and project management).

Consider an exception report showing all projects more than 90 days in arrears that are managed by someone with less than two years of experience. This report would be nearly impossible to generate without the links between different databases that the warehouse provides. In addition to the capability to link data together, a data warehouse can give users the ability to view data at different levels of aggregation.

You might start out looking at the total number of employees with Ph.D. across the company. The next step might be to drill down into the sales organization, to see how many people there have a Ph.D. (and so on, deeper into the company). This aggregation of data would be automatically handled by the reporting software using the raw data contained in the warehouse. Typically organizations that benefit from data warehousing have grown to the point where they are no longer able to answer the business questions that they are interested in. This usually happens because both the data volume and question complexity have grown beyond what the current systems can handle. At that point the business becomes limited by the information that users can reasonably extract from the data system.

That being said, most decisions to build data warehouses are driven by non-HR needs. Over the past decade, back office (supply chain) and front office (sales and marketing) organizations have spearheaded the creation of large corporate data warehouses. Improving the efficiency of the supply chain and competition for customers rely on the tactical uses that a data warehouse can provide. The key for other organizations, including HR, is to be involved in the creation of the warehouse so that their needs can be met by any resulting system.

[ Data Mining Page ] [ White Papers ] [ Data Mining Tutorial ]