Several methods can be used to aggregate
data within OLAP servers. As you can see in Figure 1.16, this method
extracts data from the relational engine and summarizes the data for
display. Another popular method pre-aggregates the data and keeps
the summarized data ready for retrieval.
Figure 1.16 Aggregation and OLAP servers.
For details about data aggregation in Oracle
warehouses, see Chapter 10, Aggregating Data For Oracle Warehouses.
Two major changes have occurred over the
past several years that have driven the movement toward data
warehousing. These changes are:
* Disk space became inexpensive--One
gigabyte of disk carried a price tag of $100,000 in 1988. Today, one
gigabyte is less than $1,000. To support large data warehouses, it
is not uncommon to require terabytes of disk storage.
* The movement into open systems--The
migration away from centralized processors has led to data residing
on a plethora of different computer and database architectures.
Metadata And The Oracle Warehouse
Because data is collected from a variety of
sources, many warehouse installations find it necessary to create a
metadata repository. But, what is the role of a metadata repository?
When data is consolidated from a variety of diverse systems, many
intrasystem problems can arise, such as:
* Homonyms--Different columns with the
* Synonyms--The same column with
* Unit incompatibilities--Inches versus
centimeters, dollars versus yen, and so on.
* Enterprise referential
integrity--Business rules that span operational systems.
* The warehouse design rules--Determine
how the tables will be built.
de-normalization (chunking tables based on time periods)--Horizontal
de-normalization refers to the process of splitting a very large
table into smaller sub-tables.
* Vertical de-normalization-- Vertical
de-normalization refers to the process of splitting columns out of
very wide tables and storing the separated columns in other database
* Using multidimensional front ends?In
many cases a multidimensional front-end is designed to make the
relational data appear multidimensional.
For details on data aggregations, see
Chapter 10, Aggregating Data For Oracle Warehouses.
ROLAP And Oracle Warehouses
There are alternatives to using a pure
multidimensional database (MDDB). One common approach is to insert a
metadata server between the OLTP relational database and the query
tool, as shown in Figure 1.17.
Figure 1.17 Using metadata repositories for
Examples of this approach include:
* DSS Agent, by Microstrategies
* Metacube, by Stanford Technology Group
* Holos, by Holistic Systems
Problem Solving For The Oracle Warehouse
If we presume that the main goal of an
Oracle data warehouse is to help management answer business
questions, then we will need to have some insight into the types of
questions that management may ask. For classification purposes, all
of the queries against a data warehouse will fall into one of the
* Statistical analysis-- Computation of
sums, averages, and correlations.
* Multivariate analysis--Comparing
classes of database objects with each other to analyze patterns.
* Simulation and modeling--Using the
data warehouse to validate a hypothesis.
* Forecasting--Using the warehouse to
predict future values.
* Aggregation--Composing new objects
from existing data structures.
Statistical Analysis And Oracle
In general, statistical analysis is one of
the easiest to implement. Standard Oracle SQL can be used to
calculate sums, averages, and aggregate values, and front-end
statistical packages such as SAS and SPSS are also commonly used for
this purpose. Statistical requests may include multivariate
analysis, simulation and modeling, forecasting (linear regression),
This type of analysis includes comparing
various classifications of database objects. Multivariate analysis
involves the use of chi-square statistical techniques to compare
ranges of values among different classifications of objects. For
example, a supermarket may keep a data warehouse of each customer
transaction. Multivariate techniques are used to search for
correlations among items, so the supermarket management can place
these correlated items nearby on the shelves. One supermarket found
that whenever males purchased diapers, they were also likely to buy
a six-pack of beer! As a consequence, the supermarket was able
increase sales of beer by placing a beer display between the diaper
displays and the checkout lines.
Multivariate analysis is normally used when
the answers to the query are unknown and is commonly associated with
data mining and neural networks. For example, whereas a statistical
analysis may query to see what the correlation is between customer
age and probability of diaper purchases when the end user suspects a
correlation, multivariate analysis is used when the end user does
not know what correlation may exist in the data. A perfect example
of multivariate analysis can be seen in the analysis of the
Minnesota Multi-phasic Personality Inventory (MMPI) database. MMPI
is one of the most popular psychological tests in America, and
millions of Americans have taken this exam. By comparing
psychological profiles of subjects with diagnosed disorders to their
responses to the exam questions, Psychologists have been able to
generate unobtrusive questions which are very highly correlated with
a specific mental illness. One example question relates to a
subject?s preference to take showers versus baths. Answers to this
question are very highly correlated with the MMPI?s measure for
self-esteem. (It turns out that the correlation showed that
shower-takers tend to have statistically higher self-esteems than
Note that the users of this warehouse do not
seek answers about why the two factors are correlated; they simply
look for statistically valid correlations. This approach has made
the MMPI one of the most intriguing psychological tests in use
today; by answering the seemingly innocuous 500 True/False
questions, psychologists can gain an incredible insight into the
personality of a respondent.
Simulation And Modeling
Simulation modeling is not new for data
warehouses. In fact, one of the first object-oriented languages,
called SIMULA, dates from the late 1960s. Simulation and modeling
are generally used with decision-support warehouses where the end
user develops a hypothesis and uses the data warehouse to test it.
For example, the end user of a shipping warehouse might ask, ?What
would happen to my total costs if we change from shipper A to
shipper B?? Modeling involves taking the data warehouse as input to
a model for testing the validity of the model. For instance, a model
may have been developed to predict the propensity of a particular
type of customer to purchase a particular class of product. This
model can be validated against the data warehouse.
Forecasting involves using a data warehouse
to predict future values of data items. While forecasting is not new
to computer processing, the use of massive data warehouses can
create a new vehicle for validating long-term forecasts.
Forecasting has long suffered from a
statistical reality called the trumpet of doom (see Figure 1.18).
While it is very easy to generate a forecast for a short-term future
value within an alpha of .05 (alpha is a measure of the ?confidence
interval? for a forecast), a long-term forecast will see the
confidence interval widening as time passes.
Figure 1.18 The trumpet of doom.
Following are the four methods for
nonlinear regression forecasting:
* Sum of the least squares.
* Single exponential smoothing.
* Double exponential smoothing.
* Triple exponential smoothing.
These techniques are applied based upon the
nature of the curve exhibited in the baseline data. These
forecasting methods are well known to computerized statistical
packages, and many tools such as the SAS system hide the smoothing
method from the end-user. Today?s data warehouse tools enable the
query system to choose the most appropriate forecasting method based
on the characteristics of the historical data. This relieves the end
user from the cumbersome task of applying different smoothing
methods to each forecast. These smoothing methods are described in
detail in any advanced statistics textbook that explains linear
Another common use for statistical analysis
is the recomposition of data into new forms. As we know from the
relational database model, data is stored in tables at the atomic
level. That is, many composite objects must be constructed from
their pieces (see Figure 1.19).
Figure 1.19 Composition of new database
In Figure 1.19, you can see that the data
warehouse user desires to model an aggregate object so that the
behavior of the artificial aggregate can be compared with the
behavior of other aggregate objects. In other words, the data
warehouse manager may wish to analyze the nature of tuition bills,
class schedules, and report cards, even though these entities are
composed, or created, from different components in the OLTP
database. This type of aggregation is often achieved through massive
de-normalization of the data structures when the data warehouse is
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts.