Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Oracle Data Warehouse Aggregation

Oracle Data Warehouse Tips by Burleson Consulting

Data Aggregation

Several methods can be used to aggregate data within OLAP servers. As you can see in Figure 1.16, this method extracts data from the relational engine and summarizes the data for display. Another popular method pre-aggregates the data and keeps the summarized data ready for retrieval.

Figure 1.16  Aggregation and OLAP servers.

For details about data aggregation in Oracle warehouses, see Chapter 10, Aggregating Data For Oracle Warehouses.

Economic Factors

Two major changes have occurred over the past several years that have driven the movement toward data warehousing. These changes are:

*     Disk space became inexpensive--One gigabyte of disk carried a price tag of $100,000 in 1988. Today, one gigabyte is less than $1,000. To support large data warehouses, it is not uncommon to require terabytes of disk storage.

*     The movement into open systems--The migration away from centralized processors has led to data residing on a plethora of different computer and database architectures.

Metadata And The Oracle Warehouse

Because data is collected from a variety of sources, many warehouse installations find it necessary to create a metadata repository. But, what is the role of a metadata repository? When data is consolidated from a variety of diverse systems, many intrasystem problems can arise, such as:

*     Homonyms--Different columns with the same name.

*     Synonyms--The same column with different names.

*     Unit incompatibilities--Inches versus centimeters, dollars versus yen, and so on.

*     Enterprise referential integrity--Business rules that span operational systems.

*     The warehouse design rules--Determine how the tables will be built.

*     Horizontal de-normalization (chunking tables based on time periods)--Horizontal de-normalization refers to the process of splitting a very large table into smaller sub-tables.

*     Vertical de-normalization-- Vertical de-normalization refers to the process of splitting columns out of very wide tables and storing the separated columns in other database nodes.

*     Using multidimensional front ends?In many cases a multidimensional front-end is designed to make the relational data appear multidimensional.

For details on data aggregations, see Chapter 10, Aggregating Data For Oracle Warehouses.

ROLAP And Oracle Warehouses

There are alternatives to using a pure multidimensional database (MDDB). One common approach is to insert a metadata server between the OLTP relational database and the query tool, as shown in Figure 1.17.

Figure 1.17  Using metadata repositories for multidimensional databases.

Examples of this approach include:

*     DSS Agent, by Microstrategies

*     Metacube, by Stanford Technology Group

*     Holos, by Holistic Systems

Problem Solving For The Oracle Warehouse

If we presume that the main goal of an Oracle data warehouse is to help management answer business questions, then we will need to have some insight into the types of questions that management may ask. For classification purposes, all of the queries against a data warehouse will fall into one of the following categories:

*     Statistical analysis-- Computation of sums, averages, and correlations.

*     Multivariate analysis--Comparing classes of database objects with each other to analyze patterns.

*     Simulation and modeling--Using the data warehouse to validate a hypothesis.

*     Forecasting--Using the warehouse to predict future values.

*     Aggregation--Composing new objects from existing data structures.

Statistical Analysis And Oracle

In general, statistical analysis is one of the easiest to implement. Standard Oracle SQL can be used to calculate sums, averages, and aggregate values, and front-end statistical packages such as SAS and SPSS are also commonly used for this purpose. Statistical requests may include multivariate analysis, simulation and modeling, forecasting (linear regression), and aggregation.

Multivariate Analysis

This type of analysis includes comparing various classifications of database objects. Multivariate analysis involves the use of chi-square statistical techniques to compare ranges of values among different classifications of objects. For example, a supermarket may keep a data warehouse of each customer transaction. Multivariate techniques are used to search for correlations among items, so the supermarket management can place these correlated items nearby on the shelves. One supermarket found that whenever males purchased diapers, they were also likely to buy a six-pack of beer! As a consequence, the supermarket was able increase sales of beer by placing a beer display between the diaper displays and the checkout lines.

Multivariate analysis is normally used when the answers to the query are unknown and is commonly associated with data mining and neural networks. For example, whereas a statistical analysis may query to see what the correlation is between customer age and probability of diaper purchases when the end user suspects a correlation, multivariate analysis is used when the end user does not know what correlation may exist in the data. A perfect example of multivariate analysis can be seen in the analysis of the Minnesota Multi-phasic Personality Inventory (MMPI) database. MMPI is one of the most popular psychological tests in America, and millions of Americans have taken this exam. By comparing psychological profiles of subjects with diagnosed disorders to their responses to the exam questions, Psychologists have been able to generate unobtrusive questions which are very highly correlated with a specific mental illness. One example question relates to a subject?s preference to take showers versus baths. Answers to this question are very highly correlated with the MMPI?s measure for self-esteem. (It turns out that the correlation showed that shower-takers tend to have statistically higher self-esteems than bath-takers.)

Note that the users of this warehouse do not seek answers about why the two factors are correlated; they simply look for statistically valid correlations. This approach has made the MMPI one of the most intriguing psychological tests in use today; by answering the seemingly innocuous 500 True/False questions, psychologists can gain an incredible insight into the personality of a respondent.

Simulation And Modeling

Simulation modeling is not new for data warehouses. In fact, one of the first object-oriented languages, called SIMULA, dates from the late 1960s. Simulation and modeling are generally used with decision-support warehouses where the end user develops a hypothesis and uses the data warehouse to test it. For example, the end user of a shipping warehouse might ask, ?What would happen to my total costs if we change from shipper A to shipper B?? Modeling involves taking the data warehouse as input to a model for testing the validity of the model. For instance, a model may have been developed to predict the propensity of a particular type of customer to purchase a particular class of product. This model can be validated against the data warehouse.

Forecasting

Forecasting involves using a data warehouse to predict future values of data items. While forecasting is not new to computer processing, the use of massive data warehouses can create a new vehicle for validating long-term forecasts.

Forecasting has long suffered from a statistical reality called the trumpet of doom (see Figure 1.18). While it is very easy to generate a forecast for a short-term future value within an alpha of .05 (alpha is a measure of the ?confidence interval? for a forecast), a long-term forecast will see the confidence interval widening as time passes.

Figure 1.18  The trumpet of doom.

Following are the four methods for nonlinear regression forecasting:

*     Sum of the least squares.

*     Single exponential smoothing.

*     Double exponential smoothing.

*     Triple exponential smoothing.

These techniques are applied based upon the nature of the curve exhibited in the baseline data.  These forecasting methods are well known to computerized statistical packages, and many tools such as the SAS system hide the smoothing method from the end-user.  Today?s data warehouse tools enable the query system to choose the most appropriate forecasting method based on the characteristics of the historical data. This relieves the end user from the cumbersome task of applying different smoothing methods to each forecast.  These smoothing methods are described in detail in any advanced statistics textbook that explains linear regression techniques.

Aggregation

Another common use for statistical analysis is the recomposition of data into new forms. As we know from the relational database model, data is stored in tables at the atomic level. That is, many composite objects must be constructed from their pieces (see Figure 1.19).

Figure 1.19  Composition of new database objects.

In Figure 1.19, you can see that the data warehouse user desires to model an aggregate object so that the behavior of the artificial aggregate can be compared with the behavior of other aggregate objects. In other words, the data warehouse manager may wish to analyze the nature of tuition bills, class schedules, and report cards, even though these entities are composed, or created, from different components in the OLTP database. This type of aggregation is often achieved through massive de-normalization of the data structures when the data warehouse is designed.

 

If you like Oracle tuning, see the book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts. 

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational