I recently looked at some code written in
Oracle7-style PL/SQL, and compared it with what is
considered best practice with Oracle8i and 9i. The more
up-to-date code generally runs around 3-5 times faster than
the old style code, and, as part of understanding where the
changes to PL/SQL have occurred, I've started off by looking
at how Oracle now handles a concept known as 'collections.'
OTN's
Oracle9i PL/SQL User Guide introduces collections as:
"A collection is an ordered group of elements, all of
the same type. It is a general concept that encompasses
lists, arrays, and other familiar datatypes. Each
element has a unique subscript that determines its
position in the collection."
So what are these used for? Well, when you first start
using PL/SQL, you generally end up declaring variables, then
retrieving a value from a table to go into the variable. If
you're going to retrieve a range of values from a table, you
declare a cursor, retrieve the values one at a time from the
database, and process them sequentially. Sometimes though,
you want to retrieve a load of values all in one go, and
load them into an array, so that you can carry out some sort
of operation on the group of values as a whole.
With Oracle7, you could create what was called an
index-by table, or 'PL/SQL table,' that consisted of a
series of value pairs; an index value, and a scalar datatype
(such as varchar2 or number). You referred to an individual
PL/SQL table entry by using the index (i.e.
CUSTOMER_NAME(10)). What made it interesting was that, as
well as using scalar datatypes, you could also create PL/SQL
tables using PL/SQL records, which could consist of a number
of individual columns. By creating a PL/SQL record type
based on an existing table (for example, by using the
SCOTT.EMP%ROWTYPE), you could load a table row, or an entire
table, into a variable and process it within your PL/SQL
package.
For example:
DECLARE
TYPE EmpTabTyp IS TABLE OF emp%ROWTYPE
INDEX BY BINARY_INTEGER;
emp_tab EmpTabTyp;
BEGIN
/* Retrieve employee record. */
SELECT * INTO emp_tab(7468) FROM emp WHERE empno = 7468;
...
END;
This would set up a PL/SQL table type that consists of
PL/SQL records with the same definition as the columns in
SCOTT.EMP, and then create a variable based on this type. It
would then retrieve a single row from the EMP table, then
place it into the EMP_TAB variable, using the index value of
7468. Note that you don't need to initialize the variable
before it's used, and you can arbitrarily put values into
the variable using any valid BINARY_INTEGER value.
As of Oracle8, PL/SQL tables were renamed collections and
supplemented by two new composite types: nested tables, and
VARRAYs.
Nested tables extend the functionality of index-by tables
by adding extra collection methods (known as table
attributes for index-by tables), and in a new development,
nested tables can also be stored in database tables and can
be directly manipulated using SQL. Collectively, both types
are known as PL/SQL tables.
To declare a nested table, you use the syntax (note the
lack of 'INDEX BY BINARY INTEGER')
TYPE type_name IS TABLE OF element_type [NOT NULL];
There are some key differences between traditional
index-by tables, and nested tables, and some good reasons
why you'd want to start using nested tables, which are
detailed in the
online documentation. In practical terms, one major
difference between nested tables and index-by tables, is
that you have to initialize nested tables using a
constructor (like you get in Java), definining how many
elements can initially be stored in it (although you can
later EXTEND the nested table); however, as mentioned above,
you can store nested tables within the database (embedded in
database columns), which is a distinct advantage over
index-by tables. So, if you want to put together a database
that is object orientated, and you need the data to be
persistent, nested tables are the way to go.
The other new type of collection, as of Oracle8, is
VARRAYs. A VARRAY(variable length array) is a datatype that
would be familiar to Java or C programmers. It has a maximum
size, which you must specify in its type definition. Its
index has a fixed lower bound of 1 and an extensible upper
bound. Thus, a VARRAY can contain a varying number of
elements, from zero (when empty) to the maximum specified in
its type definition.
So, how do you choose which of these three types
(index-by tables, nested tables and varrays) to use?
According to the docs (which also gives detailed reasons
to choose nested tables over associative arrays, and between
nested tables and varrays):
"If you already have code or business logic that uses
some other language, you can usually translate that
language's array and set types directly to PL/SQL
collection types.
- Arrays in other languages become VARRAYs in PL/SQL.
- Sets and bags in other languages become nested
tables in PL/SQL.
- Hash tables and other kinds of unordered lookup
tables in other languages become associative arrays in
PL/SQL."
So, why do the answers suggest using collections -- in
this case, nested tables?
The original method for loading the table as put forward
in the question was:
CREATE TABLE t1 AS
SELECT *
FROM all_objects
WHERE 1=0;
CREATE OR REPLACE PROCEDURE test_proc IS
BEGIN
FOR x IN (SELECT * FROM all_objects)
LOOP
INSERT INTO t1
(owner, object_name, subobject_name, object_id,
data_object_id, object_type, created, last_ddl_time,
timestamp, status, temporary, generated, secondary)
VALUES
(x.owner, x.object_name, x.subobject_name, x.object_id,
x.data_object_id, x.object_type, x.created,
x.last_ddl_time, x.timestamp, x.status, x.temporary,
x.generated, x.secondary);
END LOOP;
COMMIT;
END test_proc;
Ignoring the initial table creation script, the test_proc
producedure does three things:
1. Declares a cursor that points to the resultset from
SELECT * FROM ALL_OBJECTS
2. Starts at record one, and inserts into the t1 table the
columns from the first row in the cursor
3. Then, it loops back and gets the next row of data, until
all rows from the cursor have been retrieved.
The data is then committed, and the procedure ends.
The first solution put forward uses a nested table to
hold the data from the ALL_OBJECTS table, and does something
called BULK COLLECT to load all of the source tables' data
into the nested table.
CREATE OR REPLACE PROCEDURE fast_proc (p_array_size IN PLS_INTEGER DEFAULT 100)
IS
TYPE ARRAY IS TABLE OF all_objects%ROWTYPE;
l_data ARRAY;
CURSOR c IS
SELECT *
FROM all_objects;
BEGIN
OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_data LIMIT p_array_size;
FORALL i IN 1..l_data.COUNT
INSERT INTO t2 VALUES l_data(i);
EXIT WHEN c%NOTFOUND;
END LOOP;
CLOSE c;
END fast_proc;
/
Ignore the fact that the table's called ARRAY -- it's a
nested table, not a VARRAY.
The second example answer is a variation on this, that
does much the same thing with slightly more compact code;
SQL> create or replace procedure fast_proc is
2 type TObjectTable is table of ALL_OBJECTS%ROWTYPE;
3 ObjectTable$ TObjectTable;
4 begin
5 select
6 * BULK COLLECT INTO ObjectTable$
7 from ALL_OBJECTS;
8
9 forall x in ObjectTable$.First..ObjectTable$.Last
10 insert into t1 values ObjectTable$(x) ;
11 end;
12 /
Again, a nested table is declared, but this time the
cursor is dispensed with, and the data is just bulk loaded
directly into the nested table. Again, the FORALL statement
is used afterwards to run through the nested table, and I'll
go into how BULK COLLECT and FORALL work tomorrow.
So what is the message here? Well, first of all, if you
need to process lots of rows in one go, loading the data
into memory first, use a collection and BULK COLLECT the
data into them.