Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Donald Burleson

Click here for more
Oracle News Headlines

Important 2007 update: The rules for identification of candidates for index rebuilding are changing. Please see my updated notes on index rebuilding.

Predict if a table or index will benefit from reorganization

February 17, 2005 (Revised March 29, 2005)

Reliably predicting when to reorganize your tables and indexes is an important decision, and one that you can be the first to automate. This has become the “Holy Grail” for many Oracle professionals, with numerous articles and debates about predicting the benefit from a reorg. Some purists say that it's better to design the table/indexes never to fragment in the first place, but the reality of the Oracle DBA is that high attrition, vendor packages and the inability to control the developers makes this a very important real-world problem.

It is clear that Oracle will soon leverage upon their wonderful time-series performance repository, the AWR, and create predictive models:

In an OracleWorld 2003 presentation titled “Oracle Database 10g: The Self-Managing Database” by Sushil Kumar of Oracle Corporation, Kumar states that the new Automatic Maintenance Tasks (AMT) Oracle10g feature will "automatically detect and re-build sub-optimal indexes.“
In a paper titled “Metric Baselines: detecting and explaining performance events in em 10gr2” (Presented at the RMOUG 2005 Training Days), John Beresniewicz of Oracle Corporation notes that the use of "baselines to capture and adapt thresholds to expected time-dependent workload variations" is a core feature of the next release of Oracle.

The Objective:

The goal is to create a reliable predictive model that will suggest tables and indexes which will measurably benefit from reorganization, predict the reduction in I/O (logical I/O - consistent gets and physical I/O - physical reads) after the reorganization, and suggest changes that will prevent a reoccurrence of the fragmentation (i.e. new pctfree, new blocksize, etc.):

Oracle tables/indexes can be reorganized, sometimes resulting in faster SQL execution speeds
AWR (STATSPACK) has a history of how the tables were accessed by historical SQL (dba_hist_sql_plan, stats$sql_plan, etc.), including the I/O and CPU costs associated with each step of SQL execution.
We can see the current internal structure of every table/index (e.g. chain_cnt, clustering_factor, etc.)
Most databases have repeating patterns of behavior and historical SQL is usually representative of future activity. The model should be able to see if the table/index history has simply by examining the historical access patterns.

The steps to solving this problem are simple, but the process logic is complex:

1 - Gather table/index structure information
- There are many examples of routines that can perform a alter index xxx validate structure command and store the results in a table for analysis.

2 - Analyze historical patterns of usage
- We can alter the plan9i.sql script to access the stats$sql_plan table and see exactly how table/indexes are accessed by our SQL, and how often too:

3 - Add the decision rules
- Design an expert-system algorithm that will interrogate your data from steps 1 and 2 and create the list of tables/indexes, predicted I/O reductions, and suggested structural changes.

4 - Test and Refine your rules
- Iterate through step 3, successively refining your rules, based on actual results.

Developing your rule-set

But how can we find these rules? Do the rules have to make-sense? No, they don't! Scientists commonly create models (The Oracle data mining tools) that scan historical data and identify statistically significant correlations (within 2 standard deviations of the mean value). For example, the popular MMPI test is a set of 500 true/false questions that accesses personality with remarkable validity, and it's results are accepted in all U.S. courts. Their test-base consists of hundreds of thousands of subjects,. with a pre-diagnosed mental disorder (see DSM IV). By comparing their responses to seemingly innocuous questions (e.g. "I read the editorials in the newspaper every day") a proven predictive model was created (Federal courts have affirmed the MMPI as a scientifically valid) and accepted procedure for personality assessment.

For example, the subjects preference to take showers vs. baths is an extremely reliable measure of self-esteem. Do we know why? No. Do we care? Not really. All that is proven is that this correlation is a statistically reliable predictor of feelings of self-worth. We see the exact same scientific principle applied to Oracle data mining (ODM) tools. For example, we might find-out that people with red hair buy a disproportionate amount of skin care products. Knowing "why" is not important. What's important is knowing that the data supports the assertion. Also useful is the book "Unobtrusive Measures", which shows creative techniques for finding "hidden" significant metrics.

In sum, rules don't have to be proven true to be statistically reliable, and exceptions do not make the rule invalid. For example, if two out of every 1,000 read-haired people don't buy skin care products, we still have a model with a 99% predictive quality.

Some possible Rules for your AI engine:

To get you started, here is a list of possible conditions that may contribute to table/index fragmentation. Remember, you don't need to pre-justify your rule set. The only thing that counts is that your rules make accurate predictions.

Index fast full scans will run faster after index reorganization whenever the “density” of the index entries becomes greater. In other words, it takes less time to read 100,000 entries from a 100 block index than reading the entries from a 500 block index.
Multi-block Index range scans will run faster when the data blocks are arranged in index-key order and when the data blocks have a high number of row entries (as evidenced by clustering_factor in dba_indexes).
Large-table full-table scans will run faster after reorganization when the table has excessive chained or relocated rows, or low block density after massive DML (updates and deletes).

Proving the Benefit:

· Full-table scans

- The predictive model will have a rate, expressed as a function of the number of rows and the amount of I/O (both logical and physical) that are required to access the rows via a “db file scattered read” (Usually an index fast-full scan, a full-table scan, etc.).

CONSISTENT GETS PHYSICAL READS

DATE TABLE BLOCKS PER 1,000 ROWS PER 1,000 ROWS

6/20/05 CUSTOMER 100,000 56 72

-- TABLE REORGED ON 6/20/05 *********************************** --

6/30/05 CUSTOMER 99,000 20 11

7/30/05 CUSTOMER 99,000 35 32

8/30/05 CUSTOMER 99,000 49 55

The report output should show the forecasted reduction in I/O after the table was reorganized.

· Multi-block Index range scans

- The predictive model should be able to see the size of each index range scan (the number of rows returned), the number of blocks that were required to fetch the rows, and the number of logical and physical I/O.

· Index Rebuild

- The predictive model should capture the “speed” of index fast-full scans, multi-block index range scans, and show the degradation over time.

Sparsity of the index blocks (using dba_hist_sqlstat and dba_hist_sql_plan)
Rate of time (expressed in blocks/index row values) required to perform an index fast-full scan

Finding the Data:

The Oracle 10g AWR (and STATSPACK in previous releases) provides a time-series database. This time-series performance details can be used to create accurate predictions of the real benefit of reorganizing tables and indexes. Specifically, the AWR can measure, between any two snapshot values:

1 - The change in the speed of SQL execution steps over-time.

In this prototype below, we see the change in the speed of full-table scans:

OBJECT AVG. NBR OF

DATE OPERATION: TIME BLOCKS

6/20/05 10:00 AM TABLE ACCESSS FULL (CUSTOMER) 12.03 secs 100,000

6/21/05 11:00 AM TABLE ACCESSS FULL (CUSTOMER) 14.03 secs 100,000

6/22/05 12:00 PM TABLE ACCESSS FULL (CUSTOMER) 16.03 secs 100,000

6/23/05 01:00 PM TABLE ACCESSS FULL (CUSTOMER) 18.03 secs 100,000

6/20/05 10:00 AM INDEX FFS (CUSTOMER_FULL_IDX) 242.03 secs 100,000

6/21/05 11:00 AM INDEX FFS (CUSTOMER_FULL_IDX) 311.03 secs 100,000

6/22/05 12:00 PM INDEX FFS (CUSTOMER_FULL_IDX) 350.03 secs 100,000

6/23/05 01:00 PM INDEX FFS (CUSTOMER_FULL_IDX) 400.03 secs 100,000

2 – The state of the object at a historical snapshot time:

The AWR contains important information about the data distribution and density of the data blocks within each Oracle segment:

Table Segment Report:

FREELIST
DATE SEGEMENT BLOCKS NUM_ROWS LENGTH

6/20/05 10:00 AM CUST_TABLE 100,000 500,000,000 77

Sample Index Report:


			                                            Average   Average                     Average

Index              Average  Average          Leaf     Leaf Row     Average      Leaf Block

Name    Date       Height   Blocks           Rows     Length      Leaf Blocks     Length

CUST_IDX  6/19/05      1      236             12        234           1            7,996

The Method:

The solution does not have to generate the syntax for the reorg, just suggest table and index names, predict the improvement, and suggest changes to the parameter setting/table structure to prevent a subsequent reorg:

Table reorgs can be done with dbms_redefinition, CTAS, or “alter table move” syntax.
Index reorgs are performed by rebuilding the index with the “alter index xxx rebuild” command.
Boolean rules and process control flow can be implemented in PL/SQL stored procedures.

The Challenge:

Since Oracle is working on this same problem, it will be interesting to see if a scientist DBA can develop a model with a higher rate of success. This is not a trivial problem and a statistically valid DBA predictive model has eluded many of the world’s top DBA’s. However, I believe that the information is there, buried deep inside the STATSPACK and AWR tables. We just need a savvy DBA who can measure historical performance and determine those characteristics that recommend tables and indexes that will show measurable benefits from reorganization.

I will also promote all working solutions that I receive to show how Oracle10g can be used to solve complex DBA problems. Predictive forecasting is there now, and it’s the very core of the future of Oracle database administration.

Test your DBA skills and try this important DBA challenge! Good Luck!