The Role Of Functional Decomposition
The principles of top-down analysis tells us
to begin our data flow diagram (DFD) at a very general level. The
entire system is viewed as a single process, and this view is called
a Context Level DFD. Next, the DFD is decomposed, and levels of
detail are added to the model. Any process that can be identified
can probably be subdivided to smaller processes, and it is possible
to decompose a DFD to the level where each process represents a
single statement. An extreme example of functional decomposition
would be showing a statement such as add 1 to counter as a separate
process on the data flow diagram. The pivotal question is: At what
point should the developer stop decomposing the processes?
Theoreticians such as Gane and Sarson tell
us that a DFD should be decomposed to the functional primitive
level, where each process bubble performs one granular function.
Under this definition, one could consider that the place_an_order
behavior performs only one function and is therefore a functional
primitive process. A good rule of thumb for data warehouse analysis,
especially when the warehouse is intended to be used with a
relational database, is that a DFD should be decomposed to the level
where each process corresponds to an SQL operation. This
allows the use of triggers within a relational database and greatly
simplifies the data warehouse design.
As you are probably beginning to see, the
level of partitioning is critical for a successful data warehouse
systems analysis. Letís explore this concept of partitioning a
little further. In a traditional systems analysis, the place_order
behavior is sufficiently partitioned, whereby the place_order
process would become a single computer program. This program would
perform all of the data manipulation and would have many DML verbs
embedded within the code (see Figure 3.4). Following is what the
mini-spec might look like for a place_order process:
CHECK_CREDIT(SELECT on CUSTOMER)
ADD_ORDER(INSERT on ORDER)
CHECK_INVENTORY_LEVEL(SELECT on ITEM)
DECREMENT_INVENTORY(UPDATE on ITEM)
ADD_LINE_ITEM(INSERT on LINE_ITEM)
Figure 3.4 A Level 1 data flow diagram.
While this level of decomposing is fine for
traditional systems analysis, it is better to continue to decompose
the place_order behavior in relational data warehouses. In a
relational data warehouse, place_order would be divided into its
sub-processes, namely check_credit, add_order, check_inventory_level,
decrement_inventory, and add_line_item (see Figure 3.5).
Figure 3.5 A decomposed data flow diagram.
Each of the sub-processes within the
place_order behavior will have its own process logic specifications,
and each sub-process can be encapsulated into an Oracle stored
procedure or trigger. The behavior place_order is now decomposed
into its sub-behaviors, as shown by the mini-spec in Listing 3.2.
Listing 3.2 Mini-spec for a Level 2 DFD.
IF TOTAL_AMT > 1000
Check CREDIT_RATING in CUSTOMER
IF CREDIT_RATING = 'BAD' then reject
ELSE Store ORDER RECORD
MINI-SPEC for CHECK_INVENTORY
FOR (each item on the order)
-- compare QTY_ORDERED in ORDER with
QTY_ON_HAND in ITEM
IF QTY_ON_HAND < QTY_ORDERED
Remove item from order
Prepare backorder slip
MINI-SPEC for ADD_LINE_ITEM
FOR (each item on the order which is in
-- Add the item to the order
Subtract QTY_ORDERED from QTY_ON_HAND.
Move QTY_ORDERED to QTY in LINE_ITEM.
Store LINE_ITEM record.
As you can see, each behavior within the
database now corresponds with a single database update operation.
The insert on the line_item record is associated with the add_item
behavior. This one-to-one partitioning is only important for
database updates, as select operations may still access many records
within the database.
There is still a great deal of controversy
about the best way to approach database analysis for data warehouse
systems. Architecturally, some theoreticians state that the
relational model is better suited for use in an online transaction
processing environment and multidimensional architectures are better
suited to data warehouses. Oracle Corporation has stated that they
will incorporate a user-defined datatype in their future release of
Oracle8. Some argue that a database must be able to have datatypes
that are lists rather than finite values, and some databases, such
as UniSQL, allow for single datatypes (fields) to contain lists of
values or even another table.
Developers must remember that the main
difference between traditional systems analysis and data warehouse
analysis is the focus on the data sources. Because a data warehouse
will never have data coming from an online data input screen and all
of the end user interfaces will be ad hoc in nature, the data
identification, loading, and query processing is of foremost
Data Warehouse Analysis
Another basic distinction between
traditional systems and data warehouse systems is the goal of the
analysis phase. Data warehouse analyses are data-driven, unlike
traditional systems where process logic is the central focus. But,
how does an analyst shift from thinking about systems in the
traditional way to thinking about systems as a collection of
integrated data items?
As discussed earlier, traditional analysis
is focused on understanding the process logic and documenting the
data behind each process. Data warehouse analysis differs in that
the source databases that already exist have been clearly defined.
Therefore, the goal of data warehouse analysis is to understand
which data items are of interest to the warehouse users, how to
extract these data items from the operational databases, how to load
and package the new warehouse, and how to deliver the information to
the end users. In short, data warehouse analysis involves the
* Process Analysis--Defines the data
loading and extraction processes.
* Data Source Analysis--Describes the
data items that will be of interest to end users. This phase also
describes how the data will be edited, cleansed, aggregated, and
* Data Loading Analysis--Concentrates on
describing how data will be loaded into a data warehouse.
* Data Query Analysis--Focuses on how
end users will use the data.
Letís take a look at each of these
Data Warehouse Source Analysis
this phase of warehouse analysis, developers identify end user data
requirements and legacy data sources. This process begins by
understanding the types of queries that the end user community
desires from their new database. Data source analysis always begins
by soliciting the data warehouse user community[MM36]. Unlike other
systems where the end-users may be semi-skilled workers, the
end-users of a data warehouse are generally highly trained
statisticians whose job is to identify and exploit trends within
their functional area. Examples of this type of end-users would be
Financial manager, and Marketing analysts. It is only by
interrogating the end users that the data warehouse analyst can
understand how source data will be captured from external systems.
Remember, a data warehouse always begins as a tabula rasa, and the
data is loaded and transformed from numerous external sources.
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts.