Wednesday, September 7, 2016

Collecting Hanganalyze and Systemstate dumps

High level steps to follow:-

1.    Hanganalyze and systemstate Dumps
2.    Collecting Hanganalyze and Systemstate dumps
3.    Collection commands for Hanganalyze and Systemstate: Non-RAC:
4.    Collection commands for Hanganalyze and Systemstate: RAC
5.    Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
6.    Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
7.    Explanation of Hanganalyze and Systemstate Levels
8.    Taking Systemstate Dumps when You cannot Connect to Oracle

1.  Hanganalyze and systemstate Dumps

Hanganalyze and Systemstate dumps provide information on the processes in the database at a specific point in time. Hanganalyze provides information on all processes involved in the hang chain, whereas systemstate provides information on all processes in the database. When looking at a potential hang situation, you need to determine whether a process is stuck or moving slowly. By collecting these dumps at 2 consecutive intervals this can be established. If a process is stuck, these traces also provide the information to start further diagnosis and possibly help to provide the solution.
  • Hanganalyze is a summary and will confirm if the db is really hung or just slow and provides a consistent snapshot.
  • Systemstate dump shows what each process on the database is doing
 2.   Collecting Hanganalyze and Systemstate dumps

Using SQL*Plus connect as SYSDBA using the following command:
sqlplus '/ as sysdba'

If there are problems making this connection then in 10gR2 and above, the sqlplus "preliminary connection" can be used :

sqlplus -prelim '/ as sysdba'

Note: From 11.2.0.2 onwards, hanganalyze will not produce output under a sqlplus "preliminary connection" since it requires a process state object and a session state object. If a hanganalyze is attempted, the trace will output:

HANG ANALYSIS:

ERROR: Cannot perform hang analysis dump without a process state object and a session state object.
( process=(nil), sess=(nil) )
  
See: Note 986640.1 How To Connect Using A Sqlplus Preliminary Connection

3.  Collection commands for Hanganalyze and Systemstate: Non-RAC

Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

Hanganalyze
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

Systemstate
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit

4.  Collection commands for Hanganalyze and Systemstate: RAC

There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly.
Therefore without these fixes in place it highly unadvisable to use these level

For information on these patches see:
Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance

5.  Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088

 sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug  unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
oradebug tracefile_name
exit

6.  Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088

sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
oradebug tracefile_name
exit

In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.

7.  Explanation of Hanganalyze and Systemstate Levels

Hanganalyze levels:

Level 3: In 11g onwards, level 3 also collects a short stack for relevant processes in hang chain

Systemstate levels:

Level 258 is a fast alternative but we'd lose some lock element data

Level 267 can be used if additional buffer cache / lock element data is needed with an understanding of the cost

8.  Taking Systemstate Dumps when You cannot Connect to Oracle

On very rare occasions it is not possible to make the sqlplus connection necessary for issuing ordinary commands used above. 
In that event, there is still a back door method of getting the systemstate dump using a debugger like dbx, if that is available on your system. 
The process you attach to will most likely dump core and be killed, so do not attach to an Oracle background process. 

Here is the syntax:

dbx -a PID (where PID = any oracle shadow process)

dbx() print ksudss(10)
...return value printed here
dbx() detach

First, you will need to find a shadow process:

$ ps -ef |grep sqlplus
osupport  78526 154096   0 12:11:05  pts/1  0:00 sqlplus scott/tiger
osupport  94130  84332   1 12:11:20  pts/3  0:00 grep sqlplus

$ps -ef |grep 78526
osupport  28348  78526   0 12:11:05      -  0:00 oracles734 (DESCRIPTION=(LOCAL
osupport  78526 154096   0 12:11:05  pts/1  0:00 sqlplus scott/tiger
osupport  94132  84332   1 12:11:38  pts/3  0:00 grep 78526

You will attach to shadow PID 28348, enter the print ksudss (10) command when the prompt comes back, and detach:

$ dbx -a 28348
Waiting to attach to process 28348 ...
Successfully attached to oracle.
warning: Directory containing oracle could not be determined.
Apply 'use' command to initialize source path.

Type 'help' for help.
reading symbolic information ...
stopped in read at 0xd016fdf0
0xd016fdf0 (read+0x114) 80410014        lwz   r2,0x14(r1)
(dbx) print ksudss(10) 2
(dbx) detach


You will find a systemstate in the user_dump_dest directory under the PID trace (and probably a useless core files as well). 

$ ls -lrt *28348*
-rw-r-----   1 osupport dba        46922 Oct 10 12:12 ora_28348.trc

core_28348:
total 72
-rw-r--r--   1 osupport dba        16567 Oct 10 12:12 core
drwxr-xr-x   7 osupport dba        12288 Oct 10 12:12 ../
drwxr-x---   2 osupport dba          512 Oct 10 12:12 ./

You will find the usual header information in the trace file. 

Make sure there is an END OF SYSTEM STATE in the file.  You can grep for it or do a search in vi.  If it is not there then the dump is not complete, possibly because max_dump_file size is too small in the init.ora.

Update for 10g and higher:

In some cases, no connections are allowed on the instance (in some ORA-20 situations for example).
As of 10.1.x, there is a new option with SQL*Plus to allow access to an instance to generate traces.

sqlplus -prelim / as sysdba

For example
export ORACLE_SID=PROD                                 ###Replace PROD with the SID you want to trace
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 10
oradebug tracefile_name
exit

No comments:

Post a Comment