Question: I have an Oracle task that is
hanging. What causes Oracle to hang? How can I detect the
cause of an Oracle hang? How do I fix an Oracle hang problem?
Answer: Oracle hangs only when he is waiting
for a resource. It might be a latch (lock), or server resources
shortage (CPU, RAM).
For more complete details on diagnosing a hung database, see the book "Advanced Oracle Utilities: The
Definitive Reference".
Other possibilities for Oracle hanging include:
-
Pathing hang - Make sure that
your Oracle environment variables are properly set ($ORACLE_HOME).
-
External hang - The network
being down, Kerberos security issues, SSO or a firewall issue can
cause an Oracle connection to hang.
One way to test this is to set
sqlnet.authentication_services=(none) in your sqlnet.ora file
and retry connecting.
-
Listener down/hanging - Start
by checking the listener (check lsnrctl stat).
Also, see my notes on diagnosing Oracle network connectivity
issues.
-
RAM shortage hang - Over
allocation of server resources, usually RAM, whereby there is not
enough RAM to spawn another connection to Oracle.
-
Data hanging - It is not
uncommon for an end-user session to "hang" when they are trying to
grab a shared data resource that is held by another end-user.
-
CPU exhausted - When the
vmstat runqueue (r) column value exceeds cpu_count, you are CPU
constrained.
-
Exceeding the "processes" parameter - When you
exceed the number of the initialization "processes" parameter, the
instance will hang, and not accept new connections. Issue the
SQL*Plus command "show parameter process".
Bouncing the database is not the answer, you need to know "why" the
instance is locking up!
You may be locked up because you are hitting a system limit.
Also, make sure to check your SQL*Net log and the alert log for the
database.
Check for CPU shortages
Also, check vmstat and see if the runqueue (r) column number exceed
the cpu_count.
The only metric that identifies a CPU bottleneck is
when the run queue (r value) exceeds the number of CPUs on the server.
See high_cpu
Check for RAM shortages
Use the vmstat output and look at the scan rate (sr) and page-in (pi)
columns. To aid in measuring real page-ins, the UNIX and Linux
vmstat utility yields the scan rate (sr) column which designates the
memory page scan rate. If the scan rate rises steadily, the
page-stealing daemon's first threshold will be identified, indicating
that that particular program's entire RAM memory regions are being
paged-out to the swap disk. This behavior can then be correlated
with the vmstat page-in (pi) metric. See
tuning server RAM
Now, let's look at how to debug a hung database.
Debugging a
hung database
Back in Oracle 10g a hung database was a real
problem, especially if the DBA could not connect via SQL*Plus to release
the source of the hanging.
In these cases, the DBA had few options other than to force-down the
instance and warmstart it.
There is a new feature in Oracle 11g SQL*Plus
called the 'prelim' option. This option is very useful for running oradebug and other utilities that do not require a real connection to
the database.
root> sqlplus –prelim
SQL>
or
SQL>
set _prelim on
SQL> connect / as sysdba
At this point you are free to run the oradebug
commands to diagnose a hung database issue using the new
hanganalyze utility:
-
SQL> oradebug hanganalyze 3
-
Wait at least 2 minutes to give time to
identify process state changes.
-
SQL> oradebug hanganalyze 3
-
Open a separate SQL session and immediately
generate a system state dump.
-
SQL> alter session set events 'immediate trace
name SYSTEMSTATE level 10';
You can also gat a trace
dump to identify hung session details in Oracle Real Application
Clusters (RAC) as follows:
-
SQL> oradebug setmypid
-
SQL> oradebug setinst all
-
SQL> oradebug -g def hanganalyze 3
-
... Wait at least 2 minutes to give time to
identify process state changes.
-
SQL> oradebug -g def hanganalyze 3
There are several other ways to find the root cause
of an Oracle hanging issue:
-
Automatic Session History -
Check ASH during the hang (if you have purchased the license):
-
Report - Run a couple of
STATSPACK or AWR snapshot during the hang and get an
elapsed time report.
-
Check for locking - Whenever
Oracle is has a session hangs waiting on a resource, this
information can be found in the v$session view in the row_wait_file#
and row_wait_block#.
Column host format a6;
Column username format a10;
Column os_user format a8;
Column program format a30;
Column tsname format a12;
select
b.machine host,
b.username username,
b.server,
b.osuser os_user,
b.program program,
a.tablespace_name ts_name,
row_wait_file# file_nbr,
row_wait_block# block_nbr,
c.owner,
c.segment_name,
c.segment_type
from
dba_data_files a,
v$session
b,
dba_extents
c
see code depot for full scripts
where
b.row_wait_file# = a.file_id
and
c.file_id = row_wait_file#
and
row_wait_block# between c.block_id and c.block_id + c.blocks - 1
and
row_wait_file# <> 0
and
type='USER';
See these related notes on fixing Oracle hang issues:
-
-
-
-
-