OLAP Hybrids
It is interesting to note that many products
are beginning to create hybrid technologies that use both ROLAP and
MOLAP. Oracle Express, for example, was originally implemented with
an internal multidimensional database and required data to be
pre-summarized and loaded. Today, Oracle Express has been extended
to allow ROLAP functionality, and an Express user may query either
from the multidimensional database or directly from Oracle
relational tables. Oracle Express stores both the definition of the
multidimensional data model as well as the mapping between the OLAP
model and the relational data. Unfortunately, the mapping between
the OLAP model and the relational database is still difficult and
usually requires the assistance of an IS professional to use
Oracle's Express 4GL tool to manually maintain the mappings. Oracle
is addressing this problem by creating a graphical mapping interface
that will make it easier for the end user to map their OLAP
requirements to the Oracle RDBMS engine.
There are upper limits on the size of
multidimensional databases, and Express is suitable for storing
summarized data up to about 50 GB. These types of hybrid tools are
popular for applications where there are small standard
summarizations that can be pre-summarized and stored in the
multidimensional engine, while very large summarizations use the
ROLAP component of Express. Also, the availability of Oracle?s
parallel server has allowed SMP and MPP technology to be applied to
operational Oracle databases, greatly improving the speed of ROLAP
extracts and summarization.
Alternatives To OLAP Data Representation
Many traditional database designs can be
used to simulate a data cube. One alternative to the cubic
representation would be to leave the table in linear form, using SQL
to join the table against itself to produce a result, as shown in
Figure 5.19.
Figure 5.19 Joining a relational table
against itself.
Let's take a look at some queries that might
require the self-joining of a table. For example:
* Show all customers
in Hawaii who purchased our product more than 500 times.
* Show all customers
in Los Angeles who purchase less than 10 times per month.
* Show all large
customers (buying more than 100 items per month) in Alaska whose
usage has dropped more than 10 percent in 1995.
* Show all customers
in New York whose usage in March of 1990 deviated more than 20
percent from their usage in March of 1995.
* Show all customers
in California where the company name contains Widget and usage has
dropped more than 20 percent in 1995.
Figure 5.19 shows how a large sales summary
table can be logically partitioned into pieces by extracting rows
according to the year and month. With a single table such as
this one, we can issue SQL that will join the table against itself
to compare two date ranges. This is a very powerful technique
for using SQL to perform sophisticated variance analysis without
buying expensive tools. Listing 5.1 shows that a subset of this data
can be extracted such that only California sites with more than 100
uses per month are displayed. For display, we chose percentage
variance, number of requests, site number, ZIP code, and city. Note
the sort order of the report in Listing 5.1; it is sorted first by
ZIP, followed by city, and then by percentage variance within city.
Listing 5.1 Sophisticated variance
analysis with Oracle SQL.
SELECT
INTEGER
(((E.NUMBER_OF_SALES - S.NUMBER_OF_SALES) / S.NUMBER_OF_SALES) *
100) ,
E.CUSTOMER_NAME , E.CITY_NAME , E.ZIP , S.NUMBER_OF_SALES ,
E.NUMBER_OF_SALES
FROM DETAIL S , DETAIL E
WHERE
S.CUSTOMER_NAME = E.CUSTOMER_NAME
AND
E.STATE_ABBR = 'CA'
AND
E.DATE_YYMM = 9601
AND
S.DATE_YYMM= 9701
AND
E.NUMBER_OF_SALES < S.NUMBER_OF_SALES - (.05 * S.NUMBER_OF_SALES)
ORDER BY
E.ZIP ASC , E.CITY_NAME ASC , 1 ;
Note that the variance analysis in Listing
5.1 is done directly in the SQL statement. This case displays
California users whose usage has dropped by more than 5 percent
(comparing January 1996 to January 1997).
But, what if the user wants to compare one
full year with another year? The table is structured for simple
comparison of two specific month dates, but the SQL query could be
modified slightly to aggregate the data, offering a comparison of
two ranges of dates.
The query shown in Listing 5.2 will
aggregate all sales for an entire year and compare 1996 with 1997.
Here, we meet the request show me all customers in California whose
sales have dropped by more than 5 percent between 1996 and 1997.
Listing 5.2 Aggregating sales for an
entire year.
SELECT
INTEGER
(((E.NUMBER_OF_SALES - S.NUMBER_OF_SALES) / S.NUMBER_OF_SALES) *
100) ,
E.CUSTOMER_NAME , E.CITY_NAME , E.ZIP , S.NUMBER_OF_SALES ,
E.NUMBER_OF_SALES
FROM DETAIL S , DETAIL E
WHERE
S.CUSTOMER_NAME = E.CUSTOMER_NAME
AND
E.STATE_ABBR = 'CA'
AND
substr(E.DATE_YYMM,1,2) = "96"
AND
substr(S.DATE_YYMM,1,2) = "97"
AND
E.NUMBER_OF_SALES < S.NUMBER_OF_SALES - (.05 * S.NUMBER_OF_SALES)
ORDER BY
E.ZIP ASC , E.CITY_NAME ASC , 1 ;
On the surface, it appears that SQL can be
used against two-dimensional tables to handle three-dimensional
time-series problems. It also appears that SQL can be used to roll
up aggregations at runtime, alleviating the need to do a roll up at
load time, as with a traditional database. While this implementation
does not require any special multidimensional databases, two
important issues remain unresolved:
*
Performance--Joining a table against itself--especially when
comparing ranges of dates--may create many levels of nesting in the
SQL optimization and poor response time.
* Ability--Most end
users would not be capable of formulating this type of sophisticated
SQL query.
If you strip away all of the marketing hype
and industry jargon, you can see that a data warehouse and a
multidimensional database can be easily simulated by pre-creating
many redundant tables, each with pre-calculated roll-up information.
In fact, the base issue is clear--complex aggregation needs to be
computed at runtime or data load time.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |