 |
|
Identifying CPU Bottlenecks
with vmstat
Oracle Application Server Tips by Burleson
Consulting |
Waiting CPU resources can be shown in UNIX
vmstat command output as the second column under the kthr (kernel
thread state change) heading (see Listing 2-1). Tasks may be placed
in the wait queue (?b?) if they are waiting on a resource, while
other tasks appear in the run queue (?r?) column.
In short, the server is experiencing a CPU
bottleneck when ?r? is greater than the number of CPU?s on the
server. To see the number of CPUs on the server, you can use one of
the following UNIX commands.
Remember that we need to know the number of
CPUs on our server because the vmstat runqueue value must never
exceed the number of CPUs. A runqueue value of 32 is perfectly
acceptable for a 36-CPU server, while a value of 32 would be a
serious problem for a 24 CPU server.
In the example below, we run the vmstat
utility. For our purposes, we are interested in the first two
columns: the run queue ?r?, and the kthr wait ?b? column. In the
listing below we see that there are an average of about eight new
tasks entering the run queue every five seconds (the ?r? column),
while there are five other tasks that are waiting on resources (the
?b? column). Also, a nonzero value in the (?b?) column may indicate
a bottleneck.
root>
vmstat 5 5
kthr
memory
page
faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi
po fr sr cy in
sy cs us sy id wa
7 5 220214 141 0 0
0 42 53 0 1724 12381 2206 19 46 28
7
9 5 220933 195 0 0
1 216 290 0 1952 46118 2712 27 55 13 5
13 5 220646 452 0 0
1 33 54 0 2130 86185 3014 30 59
8 3
6 5 220228 672 0 0
0 0 0 0 1929 25068 2485 25
49 16 10
The rule for identifying a server with CPU
resource problems is quite simple. Whenever the value of the
runqueue ?r? column exceeds the number of CPUs on the server, tasks
are forced to wait for execution. There are several solutions to
managing CPU overload, and these alternatives are presented in their
order of desirability:
1. Add more processors (CPUs) to the server.
2. Load balance the system tasks by
rescheduling large batch tasks to execute during off-peak hours.
3. Adjust the dispatching priorities (nice
values) of existing tasks.
To understand how dispatching priorities
work, we must remember that incoming tasks are placed in the
execution queue according to their nice value. Tasks with a
low nice value are scheduled for execution above those tasks with a
higher nice value. Now that we can see when the CPUs are
overloaded, let?s look into vmstat further and see how we can tell
when the CPUs are running at full capacity.
Identifying High CPU Usage with vmstat
We can also easily detect when we are
experiencing a busy CPU on the Oracle database server. Whenever the
?us? (user) column plus the ?sy? (system) column times approach
100%, the CPUs are operating at full capacity .
Please note that it is not uncommon to see
the CPU approach 100 percent even when the server is not overwhelmed
with work. This is because the UNIX internal dispatchers will always
attempt to keep the CPUs as busy as possible. This maximizes task
throughput, but it can be misleading for a neophyte.
Remember, it is not a cause for concern when
the user + system CPU values approach 100 percent. This just means
that the CPUs are working to their full potential. The only metric
that identifies a CPU bottleneck is when the run queue (?r? value)
exceeds the number of CPUs on the server.
root>
vmstat 5 1
kthr
memory
page
faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi
po fr sr cy in sy cs
us sy id wa
0 0 217485 386 0 0
0 4 14 0 202 300 210 20 75
3 2
The approach of capturing server information
along with Oracle information provides the Oracle9iAS administrator
with a complete picture of the operation of the system.
Monitoring RAM Memory Consumption
In the UNIX environment, RAM memory is
automatically managed by the operating system. In system with
?virtual? memory, a special disk called swap is used to hold chunks
of RAM that cannot fit within the available RAM on the server. In
this fashion, a virtual memory server can allow tasks to allocate
memory above the RAM capacity on the server. As the server is used,
the operating system will move some memory pages out to the swap
disk in case the server exceeds its physical capacity. This is
called a page-out operation. Remember, page-out operations occur
even when the database server has not exceeded the RAM capacity.
RAM memory shortages are evidenced by
page-in operations. Page-in operations cause Oracle9iAS slowdowns
because tasks must wait until their memory region is moved back into
RAM from the swap disk. There are several remedies for overloaded
RAM memory:
Next, let?s move on and take a look at how
to build an easy UNIX server monitor by extending the Oracle
STATSPACK tables.
Now that we see how to monitor the
Oracle9iAS servers, let?s examine how we can use this data to
perform server load balancing.
Identifying UNIX Server overload
Once the data is captured in the
stats$vmstat table, there is a wealth of reports that can be
generated. Because all of the server statistics exist inside a
single Oracle table, it is quite easy to write SQL*Plus queries to
extract the data.
The vmstat data can be used to generate all
types of interesting reports. There are four classes of vmstat
reports:
-
Exception reports - These reports
show the time period where predefined thresholds are exceeded.
-
Daily trend reports - These reports
are often run and used with Excel spreadsheets to produce trending
graphs.
-
Hourly trend reports - These
reports show the average utilization, averaged by the hour of the
day. These reports are very useful for showing peak usage periods
in a production environment.
-
Long-term predictive reports -
These reports generate a long-term trend line for performance. The
data from these reports is often used with a linear regression to
predict when additional RAM memory or CPU power is required for
the server.
Let?s now examine the script that can be
used to generate these server reports and see how this information
can help us tune our Oracle database.
This is an excerpt from "Oracle
10g Application Server Administration Handbook" by Don Burleson
and John Garmany.