Tuesday, October 25, 2016

Diagnosing RAC Instance eviction issue

An instance eviction occurs when a member was evicted from the group by another member of the cluster database for one of several reasons, which may include a communications error in the cluster, failure to issue a heartbeat to the control file, and other reasons.  This mechanism is in place to prevent problems from occuring that would affect the entire database.
 For example,instead of allowing a cluster-wide hang to occur, Oracle will evict the problematic instance(s) from the cluster.  When an ORA-29740 error occurs, a surviving instance will remove the problem instance(s) from the cluster.  When the problem is detected the instances 'race' to get a lock on the control file (Results Record lock) for updating.  The instance that obtains the lock tallies the votes of the instances to decide membership.  A member is evicted if:

a) A communications link is down
b) There is a split-brain (more than 1 subgroup) and the member is not in the largest subgroup
c) The member is perceived to be inactive


The various reasons for instance eviction :-

Reason 0 = No reconfiguration  
Reason 1 = The Node Monitor generated the reconfiguration.
Reason 2 = An instance death was detected.
Reason 3 = Communications Failure
Reason 4 = Reconfiguration after suspend


To determine the reason behind your instance eviction-

a) Review each instance's alert log
b) Review instance's LMON trace file
c) Review OS Watcher logs if used. If OSWATCHER is not installed in all the nodes of your RAC clusterware, request you to please install it . Refer Note 301137.1 for  procedure of installing and how to use OS watcher
d) Review the CKPT process trace file of the evicted instance
e) Other bdump or udump files generated at the exact time of the instance eviction
f) Review each node's syslog or messages file

No comments:

Post a Comment