 |
|
Long-Term Server Analysis and
Trending
Oracle Application Server Tips by Burleson
Consulting |
You can also use the data from stats$vmstat
to gather information for long-term trend analysis. This
long-term trend analysis is very useful for the Oracle9iAS
administrator who must plan for additional application server
resources. Knowing the rate at which CPU and memory are being
consumed on the server is critical, since there is often a lag time
of several weeks between ordering and installing new hardware
resources. If you want more detail on using STATSPACK information
for management planning, please see my books Oracle High Performance
Tuning with STATSPACK, and The Oracle9i UNIX Administration Handbook
by Oracle Press.
Daily Server Alert Report
As we have repeatedly noted, the Oracle9iAS
administrator is very interested in monitoring conditions on the
Oracle database servers, Application servers and the Oracle HTTP
servers. This script is generally run daily to report on exceptional
conditions within any server in the Oracle environment. The data is
collected in five-minute intervals and reported with hourly
averages. When the Oracle9iAS administrator finds an out-of-bounds
server condition, they can run detailed reports that display the
data in five-minute intervals. These scripts from the code
depot will produce the vmstat reports on all Oracle9iAS servers.
run_vmstat.ksh - This is the driver
script that submits the vmstat alert report and e-mails the output
to the appropriate staff members.
vmstat_alert.sql - This report
provides information on the server conditions that may contribute to
poor performance.
CPU overload report with runqueue waits
When the runqueue exceeds the number of
CPUs, the server is experiencing CPU bottlenecks:
Fri
Dec 29
page 1
run queue > 2
May indicate an overloaded CPU
SERVER_NAME date
hour runq pg_in pg_ot usr
sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
BAD-01
00/12/22 13
6 0 0 62
7 32
BAD-01
00/12/22 15
3 0 0 82
18 0
BAD-01
00/12/22 17
3 0 0 76
16 8
BAD-01
00/12/27 11
3 0 0 77
5 20
RAM overload report with swapping
When page-in operations exist, the maximum
RAM capacity of the server has been exceeded. Most server
administrators rely on the pi column in vmstat to signal when
the server is swapping RAM memory. However, there is more to
the story.
There are times when the pi column will be
non-zero, even though there is no real RAM swapping. To
illustrate this let?s take a simple example. Suppose that we invoke
a 20 Megabyte Oracle executable program, such as a Pro*C program. We
don't need to load all 20 meg of the executable into RAM all at
once. Rather, we just want to load those pieces of the
executable code that require immediate execution. Hence, UNIX will
memory frames as necessary later and rely on the principle of
spatial locality to minimize the amount of pages in our RAM working
set.
To manage the memory segments, the UNIX
kernel builds a memory map of the entire program when it starts.
Included in this map is a note on whether the storage is ?in memory?
or ?on swap disk?. As the program starts it begins
accessing some of its pages that have never been loaded into RAM
memory. Hence, you may see vmstat page-in?s when a large
number of programs are starting and allocating their RAM memory.
During normal operation we may see various
points in time when paging in happens a lot and this is not always a
cause for concern. Remember, a UNIX process may page-in when
the UNIX program is starting or is accessing parts of it?s code that
it had not used before.
Paging out (the po column in vmstat) happens
frequently as UNIX prepares for the possibility of a page-in. With
UNIX virtual memory we are always anticipating running out of RAM
memory, and a page-out is a method for being ready for a subsequent
page-in. Also, as UNIX processes end they call the free() system
call to free the RAM pages so they can be used by new processes.
Internals of RAM memory paging
So if RAM paging in ?pi? may be
acceptable and paging out ?po? may be acceptable, how do we
tell when the RAM on a server is overstressed and swapping? One
answer is to correlate the UNIX scan rate with page-in operations.
When an Oracle server begins to run low on RAM memory, the page
stealing daemon process awakens and UNIX begins to treat the RAM
memory as a sharable resource, moving memory frames to the swap disk
with paging operations.
The page stealing daemon operates in two
modes. When RAM memory shortages are not critical, the daemon
will steal small chunks of least-recently-used RAM memory from a
program. As RAM resource demands continue to increase, the
page stealing daemon escalates and begins to page-out entire
programs RAM regions. In short, we cannot always tell if the page-in
operations that we see are normal housekeeping or a serious memory
shortage unless we correlate the activity of the page stealing
daemon with the page-in output.
To aid in this, the vmstat utility gives the
sr column to designate the memory page scan rate. If we see the scan
rate rising steadily, we will have hit the page stealing daemons
first threshold, indicating that entire programs RAM memory regions
are being paged out to the swap disk. Next, we will begin to
see high page-in numbers as the entire process in paged back into
RAM memory.
Carefully review the list below from vmstat.
The scan rate is the furthest right column, and here we see the
value of sr rising steadily as the page stealing daemon prepares for
a page in. As the sr value peaks, we see the page-in operation
(pi) as the real RAM memory on the Oracle server is exceeded.
root> vmstat 2
procs
memory
page
r
b w avm
free re at pi
po fr de sr
3 0 0
144020 12778 17 9
0 14 29 0
3
3 0 0
144020 12737 15 0
1 34 4 0
8
3 0 0
144020 12360 9 0
1 46 2 0
13
1 0 0
142084 12360 5 0
3 17 0 0
21
1 0 0
142084 12360 3 0
8 0 0 0
8
1 0 0
140900 12360 1 0
10 0 0 0
0
1 0 0
140900 12360 0 0
9 0 0 0
0
1 0 0
140900 12204 0 0
3 0 0 0
0
1 0 0
137654 12204 0 0
0 0 0 0
0
High CPU usage report
The DBA is often interested in times when
the database CPU utilization is greater than 95 percent.
Fri
Dec 29
page 1
user+system > 70%
Indicates an overloaded CPU
SERVER_NAME date
hour runq pg_in pg_ot usr
sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
AD-01
00/12/22 08
2 0 0 69
3 28
AD-01
00/12/22 13
12 0 0
89 11 1
AD-01
00/12/22 15
0 0 0 63
29 8
AD-01
00/12/22 17
1 0 0 53
27 20
AD-01
00/12/26 12
1 0 0 77
4 19
AD-01
00/12/27 11
3 0 0 86
6 9
Now let?s wrap-up this chapter with a review
of the major points.
This is an excerpt from "Oracle
10g Application Server Administration Handbook" by Don Burleson
and John Garmany.