Oracle Partitioning Review
February 11, 2004
Mark Rittman
Part of the
work I'm looking to do next week involves partitioning an Oracle 8i
data warehouse, so I've put a few notes together, and links to useful
sites, which might be useful for anyone else considering a similar
exercise. Here we go...
The first
port of call for information on Oracle 8i partitioning is the online
documentation available on OTN. According to the
Oracle 8i Data Warehousing Guide's section on
Partitioning Data:
"...
Partitioning features ... significantly enhance data access and
greatly improve overall applications performance. This is especially
true for applications accessing tables and indexes with millions of
rows and many gigabytes of data.
Partitioned tables and indexes facilitate administrative operations by
allowing these operations to work on subsets of data. For example, you
can add a new partition, organize an existing partition, or drop a
partition with less than a second of interruption to a read-only
application.
Using the partitioning methods described in this section can help you
tune SQL statements to avoid unnecessary index and table scans (using
partition pruning).
You can also improve the performance of massive
join operations when large amount of data (for example, several
million rows) are joined together by using partition-wise joins.
Finally, partitioning data greatly improves manageability of very
large databases and dramatically reduces the time required for
administrative tasks such as backup and restore."
With Oracle
8i (8.1.7 in our instance) there are three types of partition,
-
Range Partitioning, which maps data to partitions based on a range
of column values (usually a date column)
- Hash
Partitioning, which maps data to partitions based on a hashing
algorithm, evenly distributing data between the partitions. This is
typically used where ranges aren't appropriate, i.e. customer
number, product ID
-
Composite Partitioning, a combination of range partitioning further
subdivided by hash partitioning.
In our
instance, the tables that are being partitioned are all keyed by
transaction date, so our first approach will be to range partition by
date, and then, because we've got a multi-processor server, and
multiple physical disk units, try out composite partitioning to see if
we get a further performance gain.
The main
reason that partitioning is being considered for this data warehouse
is because of the
partition pruning feature.
In addition
to table partitioning, you can also partition indexes with Oracle 8i,
and the two usually are put in place as part of the same exercise.
Again, according to the Oracle 8i Data Warehousing Guide;
"Partition
pruning is a very important performance feature for data warehouses.
In partition pruning, the cost-based optimizer analyzes FROM and WHERE
clauses in SQL statements to eliminate unneeded partitions when
building the partition access list. This allows Oracle to perform
operations only on partitions relevant to the SQL statement. Oracle
does this when you use range, equality, and IN-list predicates on the
range partitioning columns, and equality and IN-list predicates on the
hash partitioning columns.
Partition pruning can also dramatically reduce the amount of data
retrieved from disk and reduce processing time. This results in
substantial improvements in query performance and resource
utilization. If you partition the index and table on different columns
(with a global, partitioned index), partition pruning also eliminates
index partitions even when the underlying table's partitions cannot be
eliminated.
On composite partitioned objects, Oracle can prune at both the range
partition level and hash subpartition level using the relevant
predicates"
This feature
is particularly useful for us as our queries typically are restricted
to a particular range of dates (the last month, the last quarter, the
last year and so on), the table contains data for several years, and
the execution plans usually involve full table scans. As part of a
two-pronged approach to dealing with performance issues, we're looking
to
- Make
sure appropriate indexes are in place,
- but
where the optimizer chooses not to use them (or cannot use them),
make the full table scans as efficient as possible
By
partitioning our fact tables by month, using range partitioning, the
hope is that only those partitions that are required to service the
query will be full table scanned - with the rest of the partitions
being 'pruned' - making us deal with only a small subset of the total
table data.
As well as
partitioning the table, you then need to decide how the accompanying
indexes are partitioned. With Oracle 8i, you've got two index
partitioning options;
-
Local indexes (created using LOCAL attribute) that are partitioned
in the same way as the associated table
-
Global indexes (created using the GLOBAL attribute) that can be
partitioned in any way
Typically, on
a data warehousing fact table range partitioned on a date column, a
local index is created on the date column. This has the advantage that
index partitions can be pruned in the same way as table partitions,
and you can maintain each index partition independently of the other
index partitions, which is particularly useful if you're dropping an
old partition, or loading data into a new partition. Global partitions
are usually used for OLTP applications where queries are very precise
and the overhead of scanning multiple local indexes could slow down
response times.
For more
background information on table and index partitioning with Oracle 8i,
this Don Burleson article for Builder.com is a good concise guide.
One point that Don picks up on that I hadn't thought of before, is
that for each individual partition you specify, you can separately
specify the storage parameters. According to the article;
"Oracle
has also included the ability for each partition to have its own
values for the internal Oracle control structures of an object. These
include PCTFREE, PCTUSED, INITTRANS, and MAXTRANS.
This
feature can be especially useful when a table is partitioned so that
only the most recent partition is updated. In this case, all of the
earlier partitions would have PCTUSED set to 100 and PCTFREE set to 0.
By allowing control over the partitions as if they were separate
tables, the DBA can “pack” static data into the Oracle blocks and save
space. The current partition would have PCTFREE set to a higher number
to allow for row expansion as the rows are updated."
Having looked
at the basics, I did a few searches on Google to find out
how partitions worked in practice, taking care to make sure that we
were looking at 8i partitioning - which presumably wouldn't be as
refined and easy to work with as partitioning in 9i and 10g.
One of the
first papers I came across was this one by Larry Miller at Oriolecorp,
entitled "Oracle
8 Partitions". The paper starts off with an explanation of Oracle
8 partitioning, explains how local and global indexes work, and then
gives some feedback and advice on Oracle 8 partitioning in practice.
The key pros and cons raised by the paper were;
"Tests
have underlined the following points :
-
Better management of the balance and physical location of data
-
Increased performance, especially for Management Reporting (when
accesses are limited, thanks to the WHERE clause, to some
partitions)
-
Increased performance with parallel DML for massive updates
-
Possibility to delete quickly a huge amount of data (purges)
-
Good maintenance functions (split, move)
Weak
points :
-
The best way to partition the data is hard to determine
-
How to index becomes nightmarish (many possible combinations,
depends on how the table has been partitioned)
- A
row cannot be migrated from a partition to another partition other
than by deleting it and reinserting it (with the correct value for
the partition key)"
Take a look at the paper if you get a chance, as there's a useful
amount of feedback on the building and maintenance of table and index
partitions, plus a good advice section at the end.
Some
additional articles I've come across that were useful in getting some
background included;
|