 |
|
Oracle9iAS Server Exception
Reports
Oracle Application Server Tips by Burleson
Consulting |
The SQL script vmstat_alert.sql can quickly
give a complete exception report on all of the servers in our Oracle
environment. This report will display times when the CPU and RAM
memory exceed your predefined thresholds:
set lines
80;
set pages 999;
set feedback off;
set verify off;
column
my_date heading 'date hour'
format a20
column c2 heading runq
format 999
column c3 heading pg_in format
999
column c4 heading pg_ot format
999
column c5 heading usr
format 999
column c6 heading sys
format 999
column c7 heading idl
format 999
column c8 heading wt
format 999
ttitle 'run queue > 2|May indicate an overloaded CPU|When runqueue
exceeds
the number of CPUs| on the server, tasks are waiting for service.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in)
c3,
avg(page_out)
c4,
avg(user_cpu)
c5,
avg(system_cpu) c6,
avg(idle_cpu)
c7
from
perfstat.stats$vmstat
WHERE
runque_waits > 2
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'page_in
> 1|May indicate overloaded memory|Whenever Unix performs a page-in,
the RAM memory | on the server has been exhausted and swap pages are
being used.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in)
c3,
avg(page_out)
c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu)
c7
from
perfstat.stats$vmstat
WHERE
page_in > 1
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'user+system
CPU > 70%|Indicates periods with a fully-loaded CPU
subssystem.|Periods of 100% utilization are only a | concern when
runqueue values exceeds the number of CPs on the server.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in)
c3,
avg(page_out)
c4,
avg(user_cpu)
c5,
avg(system_cpu) c6,
avg(idle_cpu)
c7
from
perfstat.stats$vmstat
WHERE
(user_cpu + system_cpu) > 70
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
The standard vmstat alert report is used to
alert the Oracle9iAS administrator and systems administrator to
out-of-bounds conditions on each Oracle server. These conditions
include:
-
CPU waits > 40% (AIX version only)
- This may indicate I/O-based contention. The solution is to
spread files across more disks or add buffer memory.
-
Runqueue > xxx ? (where xxx is the
number of CPUs on the server, 2 in this example)? This indicates
an overloaded CPU. The solution is to add additional processors to
the server.
-
Page_in > 2 - Page-in operations
indicate overloaded memory. The solution is to reduce the size of
the Oracle SGA, PGA, or add additional RAM memory to the server.
-
User CPU + System CPU > 90% - This
indicates periods where the CPU is highly utilized.
While the SQL here is self-explanatory,
let?s look at a sample report and see how it will help our systems
administrator monitor the server?s behavior:
SQL> @vmstat_alert
7
Wed
Dec 20
page 1
run queue > 2
May indicate an overloaded CPU.
When runqueue exceeds the number of CPUs
on the server, tasks are waiting for service.
SERVER_NAME date
hour runq pg_in pg_ot usr
sys idl
--------------- -------------------- ---- ----- ----- ---- ---- ----
AD-01
00/12/13 17
3 0 0 87
5 8
Wed
Dec 20
page 1
page_in > 1
May indicate overloaded memory.
Whenever Unix performs a page-in, the RAM memory
on the server has been
exhausted and swap pages are being used.
SERVER_NAME date
hour runq pg_in pg_ot usr
sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
AD-01
00/12/13 16
0 5 0
1 1 98
AD-01
00/12/14 09
0 5 0 10
2 88
AD-01
00/12/15 16
0 6 0
0 0 100
AD-01
00/12/19 20
0 29 2 1
2 98
PROD1DB
00/12/13 14
0 3 43 4
4 93
PROD1DB
00/12/19 07
0 2 0
1 3 96
PROD1DB
00/12/19 11
0 3 0
1 3 96
PROD1DB
00/12/19 12
0 6 0
1 3 96
PROD1DB
00/12/19 16
0 3 0
1 3 96
PROD1DB
00/12/19 17
0 47 68 5
5 91
Wed
Dec 20
page 1
user+system > 70%
Indicates periods with a fully-loaded CPU sub-system.
Periods of 100% utilization are only a
concern when runqueue values exceeds the number
of CPUs on the server.
SERVER_NAME date
hour runq pg_in pg_ot usr
sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
AD-01
00/12/13 14
0 0 2 75
2 22
AD-01
00/12/13 17
3 0 0 87
5 8
AD-01
00/12/15 15
0 0 0 50
29 22
AD-01
00/12/15 16
0 0 0 48
33 20
AD-01
00/12/19 07
0 0 0 77
4 19
AD-01
00/12/19 10
0 0 0 70
5 24
AD-01
00/12/19 11
1 0 0 60
17 24
PROD1
00/12/19 12
0 0 1 52
30 18
PROD1
00/12/19 13
0 0 0 39
59 2 PROD1
00/12/19 14
0 0 0 39
55 6 PROD1
00/12/19 15
1 0 0 57
23 20
You may notice that this exception report
gives the hourly average for the vmstat information. If you look at
the get_vmstat.ksh script, you will see that the data is captured in
intervals of every 300 elapsed seconds (5-minute intervals). Hence,
if you see an hour where your server is undergoing stress, you can
modify your script to show the vmstat changes every five minutes.
You can also run this report in conjunction with other Oracle9iAS
monitoring reports to identify what tasks may have precipitated the
server problem.
Daily vmstat Trend Reports
One of the jobs of the Oracle9iAS
administrator is to monitor the database and the server for regular
trends. This is not just an exercise in searching for trends because
every database will exhibit regular patterns of CPU and memory
consumption.
Using the stats$vmstat table, it is very
easy to write a query that will aggregate the CPU and memory. Below
is a sample SQL script that aggregates server values:
connect
perfstat/perfstat;
set pages 9999;
set
feedback off;
set verify off;
column
my_date heading 'date' format a20
column c2 heading runq
format 999
column c3 heading pg_in format
999
column c4 heading pg_ot format
999
column c5 heading usr
format 999
column c6 heading sys
format 999
column c7 heading idl
format 999
column c8 heading wt
format 999
select
to_char(start_date,'day') my_date,
-- avg(runque_waits) c2
-- avg(page_in)
c3,
-- avg(page_out)
c4,
avg(user_cpu + system_cpu)
c5,
-- avg(system_cpu)
c6,
-- avg(idle_cpu) c7,
avg(wait_cpu)
c8
from
stats$vmstat
group BY
to_char(start_date,'day')
order by
to_char(start_date,'day') ;
Here we can see that we can easily get any
of the vmstat values aggregated by day. In the output below we see
the average user and wait CPU times for each day of the week:
SQL>
@rpt_vmstat_dy
Connected.
date
usr wt
-------------------- ---- ----
friday
8 0
monday
10 0
saturday
1 0
sunday 1
0
thursday
6 0
tuesday
15 0
wednesday
11 0
This data can be extracted into MS-Excel and
quickly plotted for graphical reference. Please note that Donald
Burleson?s Oracle High Performance Tuning with STATSPACK covers a
method of plotting STATSPACK data in MS-Excel.
Hourly vmstat Trend Reports
We can use the same techniques to average
vmstat information by the hour of the day. An average by hour of the
day can provide very valuable information regarding times when the
server is experiencing stress:
connect
perfstat/perfstat;
set pages 9999;
set
feedback off;
set verify off;
column
my_date heading 'date' format a20
column c2 heading runq
format 999
column c3 heading pg_in format
999
column c4 heading pg_ot format
999
column c5 heading cpu
format 999
column c6 heading sys
format 999
column c7 heading idl
format 999
column c8 heading wt
format 999
select
to_char(start_date,'day') my_date,
-- avg(runque_waits) c2
-- avg(page_in)
c3,
-- avg(page_out)
c4,
avg(user_cpu + system_cpu)
c5,
-- avg(system_cpu)
c6,
-- avg(idle_cpu)
c7,
avg(wait_cpu) c8
from
stats$vmstat
group BY
to_char(start_date,'day')
order by
to_char(start_date,'day')
;
Here we see the output from this script and
we get the average runqueue and user + system CPU values and wait
CPU values, aggregated by hour of the day:
SQL>
@rpt_vmstat_hr
Connected.
date
runq cpu wt
-------------------- ---- ---- ----
00
0 4 0
01
0 5 0
02
0 3 0
03
0 1 0
04
0 1 0
05
0 1 0
06
0 1 0
07
0 1 0
08
0 1 0
09
0 1 0
10
0 1 0
11
0 1 0
12
0 11 0
13
0 21 0
14
0 23 0
15
0 20 0
16
0 15 0
17
0 20 0
18 0 12 0
19
0 10 0
20
0 5 0
21
0 1 0
22
0 1 0
23
0 1 0
This hourly information can also be
extracted into MS-Excel for graphical plotting charts that show
trends that may not be evident from a raw observation.
This is an excerpt from "Oracle
10g Application Server Administration Handbook" by Don Burleson
and John Garmany.