
Cost
Control: Inside the Oracle Optimizer
By Donald K. Burleson
OTN Member since 2001
Designing
new applications for the Oracle Cost-Based Optimizer?
Here's the latest information about how it works. |
|
|
This article has the following sections:
PART 5 - Selectivity, Clustering, and Histograms
It is important to remember that the
optimizer has knowledge of many important characteristics of column data within tables, most notably the selectivity of a column values
and the clustering factor for the column.
For example, here we see a query using a
column value to filer the result set:
select
customer_name
from
customer
where
customer_state = 'Rhode Island';
In this example, the choice to use an index
versus a full-table scan is influenced by the proportion of customers in Rhode Island. If there are a super-small proportion of
customers in Rhode Island and the values are clustered on the data blocks, then an index scan might be the
fastest execution plan for this query.
Many Oracle developers are perplexed when
the optimizer chooses a full-table scan when they are only
retrieving a small number of rows, not realizing that the optimizer is
considering the clustering of the column values
within the table.
Oracle provides a column called
clustering_factor in the
dba_indexes view that tells the optimizer
how synchronized the table rows are with
the index. When the clustering factor is close to the number of data blocks, the table rows are synchronized with the index.
The selectivity of a column value, the
db_block_size, the
avg_row_len and the cardinality all work together in helping the optimizer decide whether to use and index
versus using a full-table scan. If a data column has high selectivity and a low clustering_factor,
then an index scan is usually the fastest execution method
(see Figure 4).
In cases where most of the SQL references a
column with a high clustering_factor,
a large db_block_size
and a small avg_row_len, the
DBA will sometimes periodically re-sequence the table
rows or use a single-table cluster to maintain row order. This
approach places all adjacent rows in the same data block, removing the full-table scan and making the query up to
30x faster.
Conversely, a high
clustering_factor, where the value
approaches the number of rows in the table (num_rows),
indicates that the rows are not in the same sequence as the index, and
additional I/O will be required for index range scans. As the
clustering_factor approaches
the number of rows in the table, the rows are out of sync with the index.
However, even if a column has high
selectivity, a high clustering_factor
and small avg_row_len will indicates that the column values are randomly distributed across the
table, and additional I/O will be required
to fetch the rows. In these cases, an index range scan would cause a
huge amount of unnecessary I/O
(see Figure 5); a
full-table scan would be far more efficient.
For
queries that access common rows with a table (e.g. get all items in
order 123), unordered tables can experience huge I/O as the index
retrieves a separate data block for each row requested.

If we
group like rows together (as measured by the clustering_factor in
dba_indexes) we can get all of the row with a single block read
because the rows are together. You can use 10g hash cluster
tables, single table clusters, or manual row re-sequencing (CTAS
with ORDER BY) to achieve this goal:

In sum, the clustering_factor,
db_block_size and avg_row_len
all influence the optimizer's
decision about performing a full-table scan versus an index range
scan, and it is important to understand how these
statistics are used by the optimizer.
As we have noted, the optimizer improves
with each new release, and the latest enhancement with Oracle Database
10g is the consideration of external influences when
determining an execution plan. Oracle calls this feature
external costing
and includes both CPU and I/O cost estimates.
BC Recommended tools:

Our Ion tool is
the easiest way to analyze Oracle performance and Ion
allows you to spot hidden performance trends.
Donald K. Burleson
is one of the world's most widely-read Oracle database experts. He has written 19
books, published more than 100 articles in national magazines, and
serves as editor-in-chief of Oracle Internals, a leading Oracle
database journal. Burleson's latest book is
Creating a Self-Tuning Database
by Rampant TechPress. Don's Web sites are
http://www.dba-oracle.com ,
http://www.remote-dba.net/ and
http://rampant.cc .
|