 |
|
method_opt tips
Oracle Tips by Burleson Consulting
Updated January 25, 2015
|
Question: How does
the method_opt part of dbms_stats work to generate histograms and
how do I use method_opt in a typical database? I only want to
generate histograms when they are needed. What are the
method_opt best practices?
Answer: The
method_opt parameter is used to control the generation of
histograms. See my research on
understanding
histograms and
tuning
with histograms.
Important:
Whenever you have almost all of your SQL using bind variables, you will want
to delete your existing table statistics and then change the method_opt
to prevent future histogram creation. To delete the histograms
that were created with method_opt, just remove and re-analyze the
tables using dbms_stats,delete_table_stats. To prevent future
generation of histograms, invoke dbms_stats.set_param as follows
exec dbms_stats.set_param(pname=>'METHOD_OPT', pval=>'FOR
ALL COLUMNS SIZE 1');
The purpose of method_opt
The method_opt argument within dbms_stats controls the
following:
-
The generation of histograms
-
The creation of
extended statistics (Oracle 11g)
-
The collection of "base" column statistics
Let's look at some of the many permutations of the method_opt
clause.
The default method_opt "For all columns size auto"
Remember, it is wasteful to create a histogram on a column that is never
referenced by a SQL statement. It is the SQL workload that
determines if a column needs histograms.
The default value of
"for all columns size auto" is the Oracle default and
this option will analyze histograms on what the optimizer
considers to be "important columns". The optimizer does
this by examining your current workload when you run the
dbms_stats command, interrogating all current SQL in the
library cache to see which columns might benefit from a
histogram.
The method "for all indexed columns"
The method "for all indexed columns" limits base column gathering
to only those columns that are included in an index. This value is not
recommended as it is highly unlikely that only index columns will be used in
the select list, where clause predicates, and group by clause of all of the
SQL statement executed in the environment.
Defining extended statistics with method_opt
Collect extended statistics directly on
the expression. So for instance, if we had a function
SALES_CATEGORY, we might do this:"
DBMS_STATS.gather_table_stats
(ownname
=> USER,
tabname => ?SALES?,
method_opt
=> ?FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))? );
One exciting
feature of dbms_stats is the ability to automatically look for
columns that should have histograms, and create the histograms.
Multi-bucket histograms add a huge parsing overhead to SQL statements,
and histograms should ONLY be used when the SQL will choose a different
execution plan based upon the column value.
To aid in
intelligent histogram generation, Oracle uses the method_opt
parameter of dbms_stats. There are also important new options
within the method_opt clause, namely skewonly, repeat
and auto.
method_opt=>'for all columns size
skewonly'
method_opt=>'for all columns size
repeat'
method_opt=>'for all columns size
auto'
-- The default in 10g and beyond
In practice, there is a specific order to use the
different options of dbms_stats.
See
this article for details. Let?s take a close look at each method option.
The method_opt='SKEWONLY' dbms_stats Option
The first is the 'skewonly' option which
very time-intensive because it examines the distribution of values for
every column within every index.
If dbms_stats discovers an
index whose columns are unevenly distributed, it will create histograms
for that index to aid the cost-based SQL optimizer in making a decision
about index vs. full-table scan access.
For example, if an index has
one column that is in 50% of the rows, a full-table scan is faster than
and index scan to retrieve these rows.
Histograms are also used with SQL that has
bind variables and SQL with cursor_sharing enabled. In these
cases, the CBO determines if the column value could affect the execution
plan, and if so, replaced the bind variable with a literal and performs
a hard parse.
--**********************************************************
-- SKEWONLY option : Detailed analysis
--
-- Use this method for a first-time analysis for skewed
indexes
-- This runs a long time because all indexes are
examined
--**********************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size
skewonly',
degree => 7
);
end;
/
The
method_opt='REPEAT'
dbms_stats Option
Following the one-time detailed analysis,
the re-analyze task will be less resource intensive with the REPEAT
option. Using the repeat option will only re-analyze indexes with
existing histograms, and will not search for other histograms
opportunities. This is the way that you will re-analyze you statistics
on a regular basis.
--************************************************************
-- REPEAT OPTION - Only re-analyze histograms for
indexes
--
that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the ?repeat? option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only re-analyze histograms for
--
indexes that have histograms.
--***********************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size
repeat',
degree => 7
);
end;
/
The method_opt='AUTO' dbms_stats Option
The auto option is used when
monitoring is implemented and
creates histograms based upon data distribution and the manner in which
the column is accessed by the application (e.g. the workload on the
column as determined by monitoring, especially foreign keys to determine
the cardinality of table join result sets). Using method_opt=>?auto? is
similar to using the gather auto in the option parameter
of dbms_stats.
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size
auto',
degree => 7
);
end;
/
Remember, analyzing for
histograms is time-consuming, and histograms are used under two
conditions:
-
Table join order
:
The CBO must know the size of the intermediate result sets
(cardinality) to properly determine the correct join order the
multi-table joins.
-
Table access method
: The CBO needs to know about columns in SQL where clauses, where
the column value is skewed such that a full-table scan might be faster
than an index range scan. Oracle uses this skew information in
conjunction with the clustering_factor
columns of the dba_indexes
view.
Hence, this is the proper
order for using the dbms_stats package to locate proper columns for
histograms:
1. Skewonly option
- You want to use skewonly to do histograms for skewed columns, for
cases where the value will make a difference between a full-table scan
and an index scan.
2. Monitor - Next, turn-on monitoring. Issue an ?alter table xx
monitoring? and ?alter index yyy monitoring? command for all segments
in your schema. This will monitor workload against
3. Auto option - Once monitoring is in-place, you need to
re-analyze with the "auto" option to create histograms for join
columns within tables. This is critical for the CBO to determine the
proper join order for finding the driving table in multi-table joins.
4. Repeat option - Finally, use the "repeat" option to
re-analyze only the existing histograms.
Periodically you will
want to re-run the skewonly and auto option to identify any new
columns that require histograms. Once located, the repeat option will
ensure that they are refreshed with current values.
In Oracle we now see
11g extended
optimizer statistics, an alternative to
dynamic_sampling for estimating result set sizes.
Arup Nanda has a great article on extended statistics with dbms_stats,
specialty histogram analysis using function-based columnar data:
Next, re-gather statistics on the table and collect the
extended statistics on the expression upper(cust_name).
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'CUSTOMERS',
method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
);
end;
Alternatively you can define
the column group as part of the gather statistics command.
You do that by placing
these columns in the method_opt parameter of the gather_table_stats procedure in
dbms_stats as shown below:
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'BOOKINGS',
estimate_percent=> 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',
cascade => true
Usage notes for method_opt:
This advice comes from Greg Rahn of the Oracle Real-world tuning
group:
Question: Will dbms_stats someday detect sub-optimal table join orders from a workload,
and create appropriate histograms?
If histograms exist, then they were either automatically created because the
columns met the criteria defined on page 10-11 of the document, or manually
created. If they were created automatically, then is probable they will
influence the plan for the better.
Sub-optimal join orders are generally the result of poor cardinality estimates.
Histograms are designed to help with cardinality estimates where data skew
exists.
Question: Keeping statistics: What is the current "official" policy regarding statistics retention?
The old CW was that the DBA should collect a deep, representative sample, and
keep it, only re-analyzing when it's "a difference that makes a difference"?
I don't know if there is an "official" policy per se, but I will offer my
professional opinion based on experience. Start with the dbms_stats defaults.
Modify as necessary based on plan performance. Use dynamic sampling and/or
dbms_sqltune or hints/outlines where appropriate (probably in that order). Understand the problem before attempting solutions.
There are a couple of cases that I would be mindful of:
1) Experience has shown that poor plans can be a result of under estimated NDV
with skewed data and DBMS_STATS.AUTO_SAMPLE_SIZE
(or too small of a sample). This has been addressed/enhanced in 11g. In 10g it
requires choosing a fixed sample size that yields an accurate enough NDV to get
the optimal plan(s). The sample size will vary case by case as it is data
dependent.
2) Low/High value issues on recently populated data, specifically with
partitioned tables. If the partition granule size is small (say daily or
smaller) the default 10% stale might be too little. It may be best to gather
partition stats immediately after loading, or set them manually. It's better to
have stats that are an over estimate on the number of rows/values than an under
estimate. For example, its better to have a hash join on a small set of data
than a nested loops on a large set.
Oracle 11g Updates:
In Oracle we now see
11g extended optimizer statistics,
an alternative to dynamic_sampling for estimating result
set sizes. Also,
Oracle guru
Guy Harrison also
offers this advice for 11g statistics collection on function-based index
columns.
In 11g, I think there are two other ways to get
statistics collected for indexed expressions:
1) Collect
extended statistics directly on the expression. So for
instance, if we had a function SALES_CATEGORY, we might do this:DBMS_STATS.gather_table_stats
(ownname => USER,
tabname => 'SALES',
method_opt => 'FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))'
);
2) Create a virtual column on the expression,
then index that column. So for the same example as above we
might create the following virtual column, then index the column and
collect stats as usual:
ALTER TABLE
SALES
ADD
sales_category
GENERATED
ALWAYS AS
(sale_category(amount_sold));
I think I like the first method better, because
the statistics will still exist even if the index is dropped and ?
unlike the second approach ? it doesn?t change the logical structure of
the table.
Also see:
Oracle speaks on 10g
migration tips