 |
|
Using Arrays for Lookup
Tables
Oracle Tips by Burleson Consulting
|
The following Tip is from the
outstanding book "Oracle
PL/SQL Tuning: Expert Secrets for High Performance Programming" by
Dr. Tim Hall, Oracle ACE of the year, 2006:
Most systems contain a number of small static
lookup tables that are accessed regularly during the lifetime of a
session. In the case of batch jobs, these simple lookups may be
performed many thousands of times in one cycle of the job. In
some situations performance can be improved drastically by caching
this lookup data in PL/SQL tables rather than constantly referencing
the base table.
The create_cached_lookup_tab.sql script
creates and populates a test table which is used by the example code
in this section.
create_cached_lookup_tab.sql
DROP TABLE
cached_lookup_tab;
CREATE TABLE
cached_lookup_tab (
id NUMBER(10)
NOT NULL,
description VARCHAR2(50) NOT NULL
);
ALTER TABLE
cached_lookup_tab ADD (
CONSTRAINT cached_lookup_tab_pk
PRIMARY KEY (id)
);
DECLARE
TYPE t_tab IS TABLE OF NUMBER;
l_tab t_tab := t_tab();
BEGIN
FOR i IN 1 .. 1000 LOOP
l_tab.extend;
l_tab(l_tab.last) := i;
END LOOP;
FORALL i IN l_tab.first .. l_tab.last
INSERT INTO cached_lookup_tab (id, description)
VALUES (l_tab(i), 'Description for ' || TO_CHAR(l_tab(i)));
COMMIT;
END;
/
EXEC
DBMS_STATS.GATHER_TABLE_STATS(USER, 'cached_lookup_tab', cascade =>
TRUE);
Once the test table is created, a PL/SQL API
can then be created that will cache the lookup data and allow user
access. For the purpose of this test, the following procedures
are used:
populate_tab – This procedure populates
an associative array (index-by table) with the data from the
cached_lookup_tab table.
get_cached_info – This procedure
returns the specified lookup information from the cached table.
get_db_info – This procedure returns
the specified lookup information from the database table, not the
cached table.
Comparing the relative speeds of the
get_cached_info and get_db_info procedures gives an indication of the
performance improvements associated with caching the lookup data.
The cached_lookup_api.sql script shown below creates a package
specification and body that implements these three procedures.
cached_lookup_api.sql
CREATE OR
REPLACE PACKAGE cached_lookup_api AS
PROCEDURE
populate_tab;
PROCEDURE
get_cached_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE);
PROCEDURE
get_db_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE);
END cached_lookup_api;
/
SHOW ERRORS
CREATE OR
REPLACE PACKAGE BODY cached_lookup_api AS
TYPE t_tab
IS TABLE OF cached_lookup_tab%ROWTYPE
INDEX BY BINARY_INTEGER;
g_tab t_tab;
--
-----------------------------------------------------------------
PROCEDURE populate_tab AS
-- -----------------------------------------------------------------
BEGIN
FOR i IN (SELECT *
FROM
cached_lookup_tab)
LOOP
g_tab(i.id) := i;
END LOOP;
END populate_tab;
-- -----------------------------------------------------------------
--
-----------------------------------------------------------------
PROCEDURE get_cached_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
p_info := g_tab(p_id);
END get_cached_info;
-- -----------------------------------------------------------------
--
-----------------------------------------------------------------
PROCEDURE get_db_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
SELECT *
INTO p_info
FROM cached_lookup_tab
WHERE id = p_id;
END get_db_info;
-- -----------------------------------------------------------------
BEGIN
-- Load table during package initialization.
populate_tab;
END cached_lookup_api;
/
SHOW ERRORS
The package body contains an initialization
block. This means the populate_tab procedure is executed during
package initialization. As a result during the lifetime of the
session, the populate_tab procedure is never called manually unless a
refresh of the cached table is required.
Now that a test table and an API to access the
data is available, a batch process that requires the lookup
information needs to be simulated. The
cached_lookup_ap_testi.sql script does this by repeatedly executing
the lookup procedures, passing in random ID values.
cached_lookup_api_test.sql
--
*****************************************************************
-- Parameters:
-- 1) loops – The number of loop iterations.
-- *****************************************************************
SET VERIFY
OFF
SET SERVEROUTPUT ON
DECLARE
l_seed BINARY_INTEGER;
l_start NUMBER;
l_loops NUMBER := &1;
l_id cached_lookup_tab.id%TYPE;
l_row cached_lookup_tab%ROWTYPE;
BEGIN
--
Seed the random number generator.
l_seed := TO_NUMBER(TO_CHAR(SYSDATE,'YYYYDDMMSS'));
DBMS_RANDOM.initialize (val => l_seed);
--
Time the cached lookup.
l_start := DBMS_UTILITY.get_time;
FOR i
IN 1 .. l_loops LOOP
l_id := TRUNC(DBMS_RANDOM.value(low => 1, high => 1000));
cached_lookup_api.get_cached_info(p_id => l_id,
p_info => l_row);
END LOOP;
DBMS_OUTPUT.put_line('Cached Lookup (' || l_loops || ' rows): ' ||
(DBMS_UTILITY.get_time - l_start));
--
Time the db lookup.
l_start := DBMS_UTILITY.get_time;
FOR i
IN 1 .. l_loops LOOP
l_id := TRUNC(DBMS_RANDOM.value(low => 1, high => 1000));
cached_lookup_api.get_db_info(p_id => l_id,
p_info => l_row);
END LOOP;
DBMS_OUTPUT.put_line('DB Lookup (' || l_loops || ' rows)
: ' ||
(DBMS_UTILITY.get_time - l_start));
DBMS_RANDOM.terminate;
END;
/
The random values are generated using the
dbms_random package, and all timings are calculated in hundredths of a
second using the dbms_utility.get_time function. The
cached_lookup_ap_testi.sql script accepts a parameter that indicates
the number of calls to the lookup procedures that should be performed.
The following output shows the results for several loop sizes.
SQL> @cached_lookup_api_test.sql
100
Cached Lookup (100 rows): 0
DB Lookup (100 rows) : 2
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
1000
Cached Lookup (1000 rows): 2
DB Lookup (1000 rows) : 18
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
10000
Cached Lookup (10000 rows): 12
DB Lookup (10000 rows) : 181
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
100000
Cached Lookup (100000 rows): 127
DB Lookup (100000 rows) : 1807
PL/SQL
procedure successfully completed.
The results show that the cached lookup faster
than the table lookup, which makes a strong case for its usage.
Before launching into a full scale rewrite of
your existing code base, the following issues and how they relate to
your system should be considered:
Data Volatility – Caching data is
generally a bad idea with respect to data integrity because there is
no way of knowing if the underlying data has been altered since it was
cached. As a result, this type of processing should be limited
to static tables. In some circumstances, the benefits of using
this method offset the possible risks for medium volatility tables,
but this should be judged on a case-by-case basis.
Memory Usage – The number and size of
cached tables should be considered when using this method.
PL/SQL collections are held in memory, so large tables or large
numbers of small tables may result in considerable memory
requirements. In addition, this data is session-specific so the
memory requirements are multiplied by the number of sessions caching
data. It may be wise to restrict the caching of data to a few
select processes, like batch operations.
Initialization Time – This
method, because of the time it takes to cache the data, increases
package initialization time. For this reason, functions
requiring the cached data should be separated from those that do not.
This way the performance of sessions that do not require the cached
data will not be affected.
Lookup Type – This method is restricted
to lookups based on a single column number or string searches.
In Oracle9i Release 2, associative arrays (index-by tables) could be
indexed using string values for the first time, so prior to this
release the method is restricted to single number column searches
only.
Bad Implementation - Constantly test
the performance improvements to make sure they actually do what is
expected. It is possible to take a good idea and implement it
badly ending up with poor results.
The cached_lookup_api package is implemented
in a slightly different way to reinforce the last point. The
previous examples proved that caching data in collections is quicker
than hitting the database. Therefore the packages created by the
cached_lookup_api_incorrect.sql script should work faster than
database calls.
cached_lookup_api_incorrect.sql
CREATE OR
REPLACE PACKAGE cached_lookup_api AS
PROCEDURE
populate_tab;
PROCEDURE
get_cached_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE);
PROCEDURE
get_db_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE);
END
cached_lookup_api;
/
SHOW ERRORS
CREATE OR
REPLACE PACKAGE BODY cached_lookup_api AS
TYPE t_tab
IS TABLE OF cached_lookup_tab%ROWTYPE;
g_tab t_tab;
--
-----------------------------------------------------------------
PROCEDURE populate_tab AS
-- -----------------------------------------------------------------
BEGIN
g_tab := t_tab();
FOR i IN (SELECT *
FROM
cached_lookup_tab)
LOOP
g_tab.extend;
g_tab(g_tab.last) := i;
END LOOP;
/*
-- More efficient.
-- Use this for Oracle9i Release 2 upwards.
SELECT *
BULK COLLECT INTO g_tab
FROM cached_lookup_tab;
*/
END populate_tab;
-- -----------------------------------------------------------------
--
-----------------------------------------------------------------
PROCEDURE get_cached_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
<< array_loop >>
FOR i IN g_tab.first .. g_tab.last LOOP
IF g_tab(i).id = p_id THEN
p_info := g_tab(i);
EXIT array_loop;
END IF;
END LOOP array_loop;
END get_cached_info;
-- -----------------------------------------------------------------
--
-----------------------------------------------------------------
PROCEDURE get_db_info (p_id IN
cached_lookup_tab.id%TYPE,
p_info OUT cached_lookup_tab%ROWTYPE) AS
-- ----------------------------------------------------------------
BEGIN
SELECT *
INTO p_info
FROM cached_lookup_tab
WHERE id = p_id;
EXCEPTION
WHEN NO_DATA_FOUND THEN
p_info := NULL;
END get_db_info;
-- -----------------------------------------------------------------
BEGIN
-- Load table during package initialization.
populate_tab;
END cached_lookup_api;
/
SHOW ERRORS
Before discussing the implementation of this
package, recreate the previous test and see how the results vary.
SQL> @cached_lookup_api_test.sql
100
Cached Lookup (100 rows): 3
DB Lookup (100 rows) : 1
PL/SQL
procedure successfully completed.
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
1000
Cached Lookup (1000 rows): 21
DB Lookup (1000 rows) : 18
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
10000
Cached Lookup (10000 rows): 208
DB Lookup (10000 rows) : 181
PL/SQL
procedure successfully completed.
SQL> @cached_lookup_api_test.sql
100000
Cached Lookup (100000 rows): 2075
DB Lookup (100000 rows) : 1809
PL/SQL
procedure successfully completed.
These results clearly demonstrate that using
the collection is slower than hitting the database; the opposite of
what was proved earlier.
The reason for this discrepancy is the type of
collection used in this example. The first example used an
associative array (index-by table), which allowed the desired row to
be referenced directly because the array index was the ID value.
The second example used a regular table collection, so the only way to
retrieve the data was to loop through it looking for a match. This
method proved very inefficient, especially for large tables.
The next section shows how packages can be
used to cache global information.
|