The Oracle SQL optimizer has always been ignorant of the implied
relationships between data columns within the same table. While the
optimizer has traditionally analyzed the distribution of values
within a column, he does not collect value-based relationships
between columns.
See these important notes on
the conceptual background for extended optimizer statistics.
Functions vs. expressions
It is not a secret that an index must exactly match a where
clause predicate, and it was this issue that led to the development
of function-based indexes.
The function-based index are especially important for DATE
datatypes:
create index mydate_idx on
mytab(mydate);
-- index is used
select * from mytab where mydate > sysdate-90;
-- index is not used
select * from mytab where to_char(mydate,'MON')='MARCH';
Expression-based data relationships
Unlike function on data columns, expressions are implied and
require knowledge of the application. Because human knowledge
is required, expression
For example, analyzing a table within dbms_stats will not tell
the optimizer about implied data relationships:
- A specific city_name column correlates to
specific state_name column values.
- A zip_code column value
is only NOT NULL when country_code= 'USA'.
- A province name
IS NULL when country_code='USA'.
- A current_age column is
related to the date_of_birth.
- The senior_citizen_flag='Y'
correlates to rows where SYSDATE-date_of_birth > 65.
- The
zodiac_sign column directly relates to the value of month_of_birth
In addition, your specific application may have unique
relationships between column values:
- A credit_status of 'BAD' indicates a credit_rating values below 400
- A lawsuit_status of
'overturned' corresponds with an appeal='YES' column value.
-
A non_zero value for prison_sentence corresponds to a verdict value
of 'GUILTY'.
Without knowing these column value correlations, the optimizer
might make an incorrect cardinality estimate, leading to a
sub-optimal execution plan.
If Oracle knew about the
relationships between the columns, he (*1) could look into the
distribution of both columns and come-up with a more accurate
estimate of extended rows returned, leading to 'better' execution
plans.
Obviously, the DBA must know about these unique data
relationships. The more the DBA knows their data, the better
they can help tip-off the optimizer about these 'hidden
correlations' between column values.
Expression statistics can be created in two ways:
- Using the method_opt argument in dbms_stats
- Invoking the dbms_stats create_extended_stats procedure
Using dbms_stats.create_extended_stats
Let's take a closer look and understand how the
11g extended dbms_stats data helps the optimizer make better
guesses of result set sizes.
The new 11g dbms_stats package has
several new procedures to aid in supplementing histogram data, and
the state of these extended histograms can be seen in the
user_tab_col_statistics view:
-
dbms_stats.create_extended_stats
-
dbms_stats.show_extended_stats_name
-
dbms_stats.drop_extended_stats
You can use the dbms_stats.create_extended_stats package
to tip-off the optimizer about an implied relationship between
columns in a table. To allow the human DBA to 'tip-off' the
optimizer and make it more intelligent, Oracle created 'extended
statistics' implemented via the dbms_stats.create_extended_stats
procedure.
The dbms_stats.create_extended_stats procedure allows
you to analyze groups of columns, instead of independent columns.
You invoke dbms_stats.create_extended_stats like this:
dbms_stats.create_extended_stats (
ownname
varchar2,
tabname varchar2,
extension varchar2)
return varchar2;
The important argument is the 'extension', because it specifies
the relationship between the columns. The dbms_stats.create_extended_stats extension can be either an
expression or a group of columns. For example, you can specify
the extension as a set of column names (prison_sentence, verdict),
or you can use an expression (sale_price+sales_tax).
Creating extended statistics
Here are the steps to create extended statistics for related
table columns with dbms_stats.created_extended_stats:
1 - The first step is to create column histograms
for the related columns.
2: Next, we run dbms_stats.create_extended_stats to relate the columns together.
Unlike a traditional procedure that is invoked via an
execute ('exec') statement, Oracle extended statistics are created
via a select statement.
Let's take a closer look and see how
to implement extended column statistics. Below, we might
create the column histograms and column relationships where the
total_price column relates to the row expression where
total_price = product_price+sales_tax:
-- ********************************
-- Create column
histograms
--
********************************
exec dbms_stats.gather_table_stats
(null, 'sales' method_opt=> 'for
all columns size skewonly);
-- ********************************
-- Verify existence of histograms
-- ********************************
select
column_name,
histogram
from
user_tab_sol_statistics
where
table_name = 'SALES';
--
****************************************
- Create the extended optimizer statistics
-- ****************************************
select
dbms_stats.create_extended_stats
(NULL, 'sales', '(product_price+sales_tax)')
from dual;
Alternatives to dbms_stats.create_extended_stats
Oracle guru Guy
Harrison also offers this advice for advanced statistics collection
where he computes extended statistics directly with
dbms_stats.gather_table_stats by specifying a column function.
Collect extended statistics directly on the expression. So for instance, if we had a function SALES_CATEGORY, we might do
this: "
DBMS_STATS.gather_table_stats
(ownname => USER,
tabname => 'SALES',
method_opt => 'FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))'
);
Scripts to display extended statistics
Dr. Tim Hall provides these example queries for extended
statistics:
--
*********************************************************
--
Display extended statistics extensions
--
*********************************************************
column extension
format a30
select
extension_name,
extension
from
dba_stat_extensions
where
table_name = 'EMP';
EXTENSION_NAME
EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STULPA1A#B6YL4KQ59DQO3OADQ ("JOB","MGR")
SYS_STU2JLSDWQAFJHQST7$QK81_YB
(LOWER("ENAME"))
SYS_STUOK75YSL165W#_X8GUYL0A1X (UPPER("ENAME"))
--
*********************************************************
--
Display extended statistics distinct values and histograms
--
*********************************************************
column col_group format a30
select e.extension col_group,
t.num_distinct,
t.histogram
from
dba_stat_extensions e
join
dba_tab_col_statistics t on e.extension_name=t.column_name
and t.table_name = 'EMP';
COL_GROUP
NUM_DISTINCT HISTOGRAM
------------------------------ ------------
---------------
("JOB","DEPTNO")
9 NONE
("JOB","MGR")
8 NONE
(LOWER("ENAME"))
14 NONE
(UPPER("ENAME"))
14 NONE
--
*********************************************************
--
Display extended statistics distinct values and histograms
--
*********************************************************
column col_group
format a30
select e.extension col_group,
t.num_distinct,
t.histogram
from
dba_stat_extensions e
join
dba_tab_col_statistics t on e.extension_name=t.column_name
and t.table_name = 'EMP';
COL_GROUP
NUM_DISTINCT HISTOGRAM
------------------------------ ------------
---------------
("JOB","DEPTNO")
9 NONE
("JOB","MGR")
8 NONE
(LOWER("ENAME"))
14 NONE
(UPPER("ENAME"))
14 NONE
Extended Statistics Usage Notes: The Oracle
documentation notes these limitations on the
dbms_stats.create_extended_stats extension argument:
- The extension cannot contain a virtual column.
- Extensions cannot be created on tables owned by SYS.
-
Extensions cannot be created on cluster tables, index organized
tables, temporary tables or external tables.
- The total
number of extensions in a table cannot be greater than a maximum of
(20, 10% of number of non-virtual columns in the table).
-
The number of columns in a column group must be in the range [2,
32].
- A column cannot appear more than once in a column
group.
- A column group can not contain expressions.
-
An expression must contain at least one column.
- An
expression cannot contain a subquery.
- The COMPATIBLE
parameter needs to be 11.0.0.0.0 or greater
*1: Ever since the 1970's IMS databases, all database
management systems have been referred to as masculine, such as 'When
Oracle complies a precompiler, he checks the semantics'. This
is database tradition, not sexist! For details, see
why is Oracle male?
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|