Overview of intelligent indexing
-
It is a myth that indexes must per persistent
at all times ? In reality, it is more efficient for some indexes
to be created and destroyed as-needed by queries.
-
The factors that effect index persistence is
the size of the table and the frequency that the SQL invokes the
index.
-
For example, an indexes that is only used
once per month may be better being create immediately before
use.
-
Un-used indexes are expensive to maintain ? DML is slower and
this can be significant on busy systems.
Dynamic systems commonly have end-of-period job,
some end-of-day (EOD), and of week (EOW), end of month (EOM) end of
quarter (EOQ) and end of year (EOY).
-
If it not uncommon to have large batch jobs
that run during these periods that only use an index during the
period, and not at any other time.
-
In cases of high DML databases with long
period between index usage, it is most effective to build the
index run the job and destroy the index, rather than constantly
maintaining the index
-
This concept is especially true for bitmap
and Con*text indexes, which have a huge cost of DML maintenance.
Many bitmap indexes and Con*text only exist during end-of-period
processing (when the database is read-only).
In order to be effective at just-in-time indexes, you must be able
to weigh the options between the costs and benefits:
-
The cost is the sum of the additional cost to
keep the index current during DML operations. You can
estimate this conservatively as 1 disk write per DML. If
an un-used indexes experiences 12,000,000 DML statements then we
can estimate a cost of 12 million disk I/O's on the system.
-
There is also the additional costs of locking
when an in-used index is updated.
Oracle has a wealth of features that allow the DBA and developer to
fetch data from an Oracle database with a minimum of trips to the
database.
Ever since direct access storage (DASD) was introduced a
half-century ago, computer scientists have recognized that disks
have addresses, and that indexes can be built on any disk file.
The Indexed sequential access method (ISAM) was done by creating a
new sorted file that contains the symbolic key and the disk
address. As the first large-scale commercial databases were
introduced (IMS, IDMS), commercial databases started offering
choices of indexing methods. These early databases allowed the DBA
to crate tree indexes, and allowed the DBA to control the size of
the index nodes.
While the concept of an index is straightforward, the implementation
of an intelligent Oracle indexing strategy can be extremely
complex. Oracle Database 11g offers many types of indexing but the
usage of indexes depends upon the shifting SQL workloads.
The vast majority of OLTP shops experience vastly different
workloads for daytime transaction processing, evening batch job and
end of period reporting and it is these workloads that drive the
index usage.
Oracle does not report on wasted indexing resources, and there are
many perils to over-indexing and under-indexing for any given SQL
workload. Just as a missing index requires unnecessary large-table
full-table scans, maintaining an un-used index requires a huge
amount of computing resources.
In many cases, indexes that are only used periodically will be
expensive and inefficient to maintain while they are not being used,
and it is far more cost effective to create and destroy indexes
as-needed, based on the needs of each specific SQL workload.
Just as manufacturing has implemented just-in-time delivers of raw
materials, an intelligent DBA strategy allows for specific indexes,
optimized instance parameter settings, and customized optimizer
statistics to be delivered immediately before a shift in the
workload.
Using data from real-world production databases, this presentation
will present a proven approach for intelligent indexing:
-
Understand the cost and benefits of
indexes: The DBA needs to understand the relative merits and
cost of sorted hash clusters, index-organized tables, b-tree
indexes, bitmap indexes multi-column indexes. After reviewing
these and learn how to match the appropriate index type to their
specific SQL workload.
-
Intelligent index enhancement: The
Oracle DBA needs to know how to add human intelligence to
tip-off the SQL optimizer. This includes optimizer parameters
such as optimizer_index_caching, intelligently creating column
histograms and deploying Oracle Database 11g extended optimizer
statistics, tools a smart DBA can use to improve the efficiency
of the SQL optimizer.
-
Identify repeating workloads: The DBA
will use SQL execution plan feedback from AWR and STATSPACK to
identify distinct repeating workloads. These workload are then
named (e.g. day_mode, night_mode, eow_mode, eom_mode), and a
profile of index usage is gathered from historical data.
-
Profile index usage for each workload:
Next, we determine the optimal indexing strategy for each
workload, counting the number of time per minute that each index
is invoked and plotting usage patterns for multi-column indexes.
-
Intelligently schedule instance
customization: Based on the usage for each index, we will
identify duplicate indexes, un-used indexes and those indexes
that are unique to each workload. We can then use the
dbms_scheduler utility to customize the Oracle instance to meet
the demands of each distinct workload.
This presentation also reveals secrets scripts to see track
indexes are the most popular over time and how often the index is
invoked by SQL. We also explore how to track which columns of a
multi-column indexes are referenced by your SQL. In sum, this
presentation is indispensible for any DBA who must manage a mission
critical database corporate database with limited computing
resources.
Oracle & Indexing
As Oracle came to dominate the data processing
world, they began to offer a wealth of indexing strategies. We need
to start by distinguishing between 'logical' and 'physical' indexing
structures.
Note: Oracle indexes can cause 'extra' I/O
operations, and the choice to use indexes is dependent upon your
choice of optimizer_mode. Not all database transactions will
benefit from index usage and sometimes you want your SQL optimizer
to optimize for fast throughput instead of fast response time:
The logical indexing strategies are deployed at
the analytical level and do not involve any knowledge of the
underlying physical database structures:
Logical Indexes:
Oracle Physical Indexes Oracle is the
world' s most flexible and robust database because he offers almost
every conceivable algorithm for speeding-up access to data:
Physical Index structures:
While b-tree indexes are great for simple
queries, they are not very good for the following situations:
-
Low-cardinality columns: Columns with less a
tiny number of distinct values do not have the selectivity that
is required in order to benefit from standard b-tree index
structures.
-
No support for SQL functions: B-tree indexes
are not able to support SQL queries using Oracle's built-in
functions.
Bitmap Indexes
Bitmap indexes are great for low
cardinality columns but they have a huge overhead for DML, and most
shops drop and rebuild bitmaps during off hours processing.
Hence, bitmap indexes are not for active OLTP
databases and they are used primarily for read-only DSS of data
warehouse systems. Bitmaps, while not very useful by themselves, are
great when combined together.
For example, assume there is a motor vehicle
database with numerous low-cardinality columns such as car_color,
car_make, car_model and car_year. Each column contains less than
100 distinct values by themselves, and a b-tree index would be
fairly useless in a database of 20 million vehicles. However,
combining these indexes together in a query can provide blistering
response times a lot faster than the traditional method of reading
each one of the 20 million rows in the base table. For example,
assume we wanted to find old blue Toyota Corollas manufactured in
1981:
select
license_plat_nbr
from
vehicle
where
color = ' blue'
and
make = ' toyota'
and
year = 2014;
Oracle uses a specialized optimizer method called
a bitmapped index merge to service this query. In a bitmapped index
merge, each Row-ID, or RID, list is built independently by using the
bitmaps, and a special merge routine is used in order to compare the
RID lists and find the intersecting values. Using this methodology,
Oracle can provide sub-second response time when working against
multiple low-cardinality columns:

Oracle bitmap merge join
Does Size Matter?
To meet the requirements of all operating systems from a mainframe
to a Mac, Oracle has chosen to implement the index node size based
upon the DBA's choice for the database block size. An index built
in a db_32k_cache_size will contain more keys per index node than
the same index created in a db_2k_block_size.
While the vast majority of indexes would reside in your database's
default db_cache_size, there are exceptions and some large OLTP
databases have shown great performance improvements when moved into
a blocksize that matches the I/O characteristics of the index.
For example, during a nested loop join, one index received
multi-block reads (index range scans) while the other index is doing
index unique probes. In these cases, moving the index that
experienced multi-block reads into a large blocksize allows more
keys to be read in a single data buffer get.
Workload-driven index usage
It's backwards, but at implementation time, the DBA may not be
aware of the SQL workload that will be used against their tables and
this lack of a-priori knowledge of the SQL can lead to the
phenomenon of 'missing indexes', especially function-based indexes.
Hence, all DBAs adopt a workload-based approach, examining their
workloads for I/O signatures and implementing indexing strategies
that match the workload. Most production systems have at least two
workload modes, a 'day mode' characterized by small OLTP SQL
optimized for fast response time, and a 'night mode' characterized
by large batch jobs that need to be optimized for fast throughput.
Day Mode Night Mode
EOM Mode
Data Warehouse read-only batch
updates summary reports OLTP
small transactions batch reports summary reports
An iterative process of intelligent SQL workload tuning
involves these steps:
- Analyze the workload: Use monitoring tools to identify
repeating processing signatures
- Codify and 'name' the workload signatures (day mode,
night mode, EOW mode, EOM mode)
- Identify optimal setting for each workload (init.ora
parameters, SGA regions and CBO statistics, building custom
indexes)
- Schedule just-in-time morphing of Oracle to accommodate the
changing workload
End of Period Workloads
In addition to the normal day mode and night mode workloads, there
are many systems that have special end-of-week, end-of-month, and
end-of-quarter processing that require an entirely new SGA and
database configuration.
- PGA adjustments: You can over-ride the governor that
only allows any job to grab 5% of the PGA, and you can set very
large values for sort_area_size and hash_area_size to allow the
single-threaded batch job to consume all of the PGA, if
necessary.
- Optimizer parameters: Certain parameters like
optimizer_mode, optimizer_index_caching can change the
propensity of Oracle to use indexes.
- Statistics: Customized EOM CBO statistics can be collected,
save and imported when you switch to EOM mode. This is
especially important for cases where specialized 'weekend-only'
indexes are used and optimizer histograms are required.
- SGA Adjustments: When you have a batch mode when only a few
concurrent jobs will run simultaneously, you will adjust the
'default governors' in Oracle wo allow the batch jobs to consume
all available computing resources.
- B-Tree Indexes: Some indexes are only used for EOW
processing, and it's smart to build them on Friday night and
nuke the index on Sunday night.
- Bitmap Indexes: Many EOM reports are read-only and building
temporary bitmap indexes can greatly improve the throughout for
jobs that may run more then 10 hours.
- Con*Text Indexes: Many EOY reports need to 'look inside'
CLOB columns, and a Friday night build of a word index on a CLOB
is perfect for read-only weekend reports. The index is
destroyed on Sunday night.
Proactive workload analysis and predictive modeling
Statistically proven predictive analytic procedures comes with the
Oracle Data mining package (ODM), but DBAs are now only learning
that these same principles can be applied to their own production
SQL workloads. It is possible to fix problems before they affect
your users! All you need is a way to interrogate your historical
data from STATSPACK or AWR and identify repeating 'signatures',
statistically significant repeating patterns of usage that can be
used to predict future events!
'Those who forget the past are condemned to repeat it' George
Santayana
'The farther that you can look into the past, the farther you
can look into the future' Winston Churchill
' Predictions are difficult, especially about the future" - Yogi
Berra
If you closely study your past workloads, it' s easy to develop a
proactive approach that will intelligently anticipate new repeating
workloads and change the SGA to accommodate the changes, before it
impacts the end-users. Oracle offers a similar one-sized-fits-all
approach called Automatic Memory Management (AMM) but AMM is
reactive and waits until a problem is detected before fixing it.
SQL workload
Starting in Oracle 10g, Oracle finally recognized that it is the SQL
workload that determines the optimal index placement.
- Intelligent index management
- Identifying missing indexes
- Identifying un-used indexes
- Interrogating index usage within a workload
- Building custom Indexes for workloads
- Using function-based indexes
Interrogating index usage within a
workload
index_usage_hr.sql
col c1 heading ' Begin|Interval|time' format a20
col c2 heading ' Search Columns' format 999
col c3
heading ' Invocation|Count' format 99,999,999
break on c1 skip
2
accept idxname char prompt ' Enter Index Name: '
ttitle ' Invocation Counts for index|&idxname'
select
to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1,
p.search_columns c2,
count(*) c3
from
dba_hist_snapshot sn,
dba_hist_sql_plan p,
dba_hist_sqlstat st
where
st.sql_id = p.sql_id
and
sn.snap_id = st.snap_id
and
p.object_name = ' &idxname'
group by
begin_interval_time,search_columns;
This will produce an output like this, showing a
summary count of the index specified during the snapshot interval.
This can be compared to the number of times that a table was invoked
from SQL. Here is a sample of the output from this script:
Interval Invocation
time Search Columns Count
-------------------- -------------- -----------
04-10-21
15 1 3
04-10-10
16 0 1
04-10-10
19 1 1
04-10-11
02 0 2
04-10-11 04
2 1
04-10-11
06 3 1
04-10-11
11 0 1
04-10-11
12 0 2
04-10-11
13 2 1
04-10-11
15 0 3
04-10-11
17 0 14
Using function-based indexes
Here we see a Function Based Index (FBI) on
a large VARCHAR column based on a credit card number that it
contains. The database will create this index, but even with a hint,
the optimizer will not use it' and it is expensive to maintain.
create index card_idx on dumbtbl (REGEXP_SUBSTR
(lots_data,?(([0-9]{4})([[:space:]])){3}[0-9]{4}?));
Table row re-sequencing and index usage
The original Relational Manifesto by Codd and
Date stated that the physically ordering of rows on disk should not
matter, but in the real world, row placement matters a lot,
especially for tables whose rows are always accessed in the same key
order. The dba_indexes view has a clustering_factor column that
tells the optimizer how ' synchronized an index is with the tables
and values range from ' blocks' (synchronized):

On the other extreme, when clustering_factor
approaches num_rows (not synchronized), range scan operations may
require hundreds more I/Os to complete a transaction:

Many DBAs will search the workload and seek
tables that are frequently accessed by the same index, using an
index range scan.
Here is an unclustered index situation. This
transaction displays all customers in a city:
-
Only 10 rows per data block
-
Clustering factor is ' poor' for the
primary index
-
Lots of ' unnecessary' disk reads to
fetch all customers for a city

Unclustered index situation
Below is a clustering index situation: 400 Rows
fit onto a data block.

As we see, the decision to reorganize tables into
the same order of an index depends on several factors:
-
The percentage of total that the index
is used to access the table rows. Remember, a table can only be
re-sequenced according to one index, and it must be the most
popular index.
-
The index uses and index full-scan, index
fast-full-scan, or index_range_scan. Remember, some indexes are
only used for ' unique' index probes, and these will see not
benefit from reorganization.
-
The database block size (db_nn_cache_size)
-
The average index key length
(dba_tables.avg_row_len)
Typical cases might include:
-
Order Items: An order processing
systems displays all items for an order 1,000 times per hour
with an average of 40 consistent gets per execution.
System-wide, the item_idx is used 95% of the time to show the
items within an order. Because there are about 70 items per
order, and the item rows are 80 bytes each, reorganizing the
item table by item_idx will reduce I/O by a factor of at least
40x.
-
Student Classes: Assume a 2k University
database that displays all classes for a student 10 times per
second. The class table is using the student_semester_idx 97%
of the time with index range scans, and a student takes an
average of 4 classes per semester.
What factors should we consider when
re-organizing this table?
Oracle provides several methods for allowing the
DBA to re-sequence table rows into the same order as the most
popular index:
Oracle MOSC Note: 223117.1 has some great advice
for tuning-down ' db file sequential read' waits by table
reorganization in row-order:
-
If Index Range scans are involved, more
blocks than necessary could be being visited if the index is
unselective: by forcing or enabling the use of a more selective
index, we can access the same table data by visiting fewer index
blocks (and doing fewer physical I/Os).
-
If the index being used has a large
Clustering Factor, then more table data blocks have to be
visited in order to get the rows in each Index block: by
rebuilding the table with its rows sorted by the particular
index columns we can reduce the Clustering Factor and hence the
number of table data blocks that we have to visit for each index
block.
Conclusion
Even though Oracle has introduced many automated
features for locating missing indexes, no artificial tools will ever
be able to replace the intuition of a human expert. Those that
forget the past are condemned to repeat it, and it is the job of
every data base administrator to analyze their historical
performance data to identify workload based indexing opportunities.
By applying an intelligent approach to Oracle indexing you can
insure that the indexes match the data access patterns and prevent
wasting computing resources by maintaining unnecessary indexes.
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|