Wednesday, February 15, 2017

How to relocate RAC Database Services?

$ srvctl relocate service -h

Temporarily relocates service from one node of the cluster to another.

Usage: srvctl relocate service -d <db_unique_name> -s <service_name> {-i <old_inst_name> -t <new_inst_name> | -c <current_node> -n <target_node>} [-f]
       Specify instances for an administrator-managed database, or nodes for a policy managed database
    -d <db_unique_name>      Unique name for the database
    -s <service>             Service name
    -i <old_inst>            Old instance name
    -t <new_inst>            New instance name
    -c <current_node>        Node name to relocate service from
    -n <target_node>         Node name to relocate service to
    -f                       Disconnect all sessions during stop or relocate service operations
    -h                       Print usage

Exmaple:
-------

[oracle@uktul02oradb51b ~]$ srvctl config service -d TUL10ATF_SDC
TUL_TUL10ATF_ATG1 PREF: TUL02ATF1 AVAIL: TUL02ATF2
TUL_TUL10ATF_ATG2 PREF: TUL02ATF1 AVAIL: TUL02ATF3
TUL_TUL10ATF_TIBCO PREF: TUL02ATF1 TUL02ATF2 TUL02ATF3 AVAIL:


$srvctl relocate service -d TUL10ATF_SDC -s TUL_TUL10ATF_ATG1 -i TUL02ATF2 -t TUL02ATF1 -f

$srvctl relocate service -d TUL10ATF_SDC -s TUL_TUL10ATF_ATG2 -i TUL02ATF3 -t TUL02ATF1 -f

$srvctl relocate service -d TUL10ATF_SDC -s TUL_TUL10ATF_TIBCO -i TUL02ATF2 -t TUL02ATF1 -f

$ srvctl status service -s TUL_TUL10ATF_ATG1 -d TUL10ATF_SDC
Service TUL_TUL10ATF_ATG1 is running on instance(s) TUL02ATF2

$ srvctl status service -s TUL_TUL10ATF_ATG2 -d TUL10ATF_SDC
Service TUL_TUL10ATF_ATG2 is running on instance(s) TUL02ATF3

$ srvctl status service -s TUL_TUL10ATF_TIBCO -d TUL10ATF_SDC
Service TUL_TUL10ATF_TIBCO is running on instance(s) TUL02ATF2, TUL02ATF3

10053 event used for CBO Tracing

What is the CBO doing, how it comes up with an execution plan? This is where the 10053 trace event comes into play. The 10053 trace shows us HOW the CBO came to its decision (the final execution plan).


 10053 trace for "GOOD" and "BAD" execution:  

alter session set timed_statistics = true; 
alter session set statistics_level=all; 
alter session set max_dump_file_size = unlimited; 
alter session set events '10053 trace name context forever, level 1'; 
-- /*Run SQL here*/ ------ 
alter session set events '10053 trace name context off'; 


There is a new method to obtain an Optimizer (10053) trace for any sql statement that is already in the cache, without actually executing it and also enabling 'events infrastructure'. With 11gR2 onwards, we can either use the DBMS_SQLDIAG.DUMP_TRACE package procedure or use new diagnostic events infrastructure to obtain the 10053 trace for a given sql_id


Demonstration 1:

1. Find out the sql_id for the given statement:

SELECT sql_id,child_number FROM v$sql WHERE sql_text LIKE '%SQL TEXT%';

2. Input the sql_id to the DBMS_SQLDIAG.DUMP_TRACE package procedure:

execute DBMS_SQLDIAG.DUMP_TRACE(p_sql_id=>'sql_id',  p_child_number=>0, 
p_component=>'Compiler',
p_file_id=>'ABCD');

Trace file with ABCD post fix can be found under $ORACLE_HOME/diag/rdbms/dbname/instance_name/tracedirectory.

Demonstration 2: (excerpt from MOS Note)
SQL> alter session set max_dump_file_size = unlimited;
SQL> ALTER SESSION SET EVENTS 'trace[rdbms.SQL_Optimizer.*][sql:sql_id]'; 
SQL> --Execute the query --
To disable the trace
SQL> ALTER SESSION SET EVENTS 'trace[rdbms.SQL_Optimizer.*] off';


Refer to MOS ML: 

How to Obtain Tracing of Optimizer Computations (EVENT 10053) [ID 225598.1]
alter session set tracefile_identifier='10053_&your_name'; 



SQLT and 10046 and 10053 trace


1. Getting SQLT XTRACT report for the poor performing query (baxk1rgw5ht0c). 


You may review the notes below for additional information: 
--> Download SQLT tool from here SQLT Diagnostic Tool ( Doc ID 215187.1 ) 
--> Install SQLT as per SQLT Usage Instructions ( Doc ID 1614107.1 ) 
--> Identify your SQL ID as per document How to Determine the SQL_ID for a SQL Statement ( Doc ID 1627387.1 ) 
--> Execute sqlt XTRACT method for your SQL_ID 

# cd sqlt/run 
# sqlplus apps 
SQL> START sqltxtract.sql <your_sql_id> sqltxplain_password 

SQLT XTRACT will display all the execution plans available for this query, so we can see if there is a plan stability issue. It will also provide STA report to see if any recommendations available. 
>>> FAQ: SQLT (SQLTXPLAIN) Frequently Asked Questions ( Doc ID 1454160.1 ) 

2. 10046 trace for "GOOD" and "BAD" execution: 
alter session set tracefile_identifier='10046_&your_name'; 
alter session set timed_statistics = true; 
alter session set statistics_level=all; 
alter session set max_dump_file_size = unlimited; 
alter session set events '10046 trace name context forever,level 12'; 
-- /*Run SQL here*/ ------ 
select 'CLOSE CURSOR' close_cursor from dual; --- to ensure the previous cursor is closed (or select * from dual;) 
alter session set events '10046 trace name context off'; 

3. 10053 trace for "GOOD" and "BAD" execution: 
alter session set tracefile_identifier='10053_&your_name'; 
alter session set timed_statistics = true; 
alter session set statistics_level=all; 
alter session set max_dump_file_size = unlimited; 
alter session set events '10053 trace name context forever, level 1'; 
-- /*Run SQL here*/ ------ 
alter session set events '10053 trace name context off'; 

Steps to Create RAC Service on Windows


Add the service using the below command

>srvctl add service -d UKTILDB -s TIL_UKTILDB_TIBCO_UK -r UKTILDB1 -a UKTILDB2

The added service will be listed as offline since we are adding the service as our own user.

>crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....DB2.srv application    ONLINE    ONLINE    ukti...b01a
ora....LING.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    ONLINE    ONLINE    ukti...b01a
ora....IBCO.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    OFFLINE   OFFLINE
ora....O_UK.cs application    OFFLINE   OFFLINE
ora....B1.inst application    ONLINE    ONLINE    ukti...b01a
ora....B2.inst application    ONLINE    ONLINE    ukti...b01b
ora.UKTILDB.db application    ONLINE    ONLINE    ukti...b01a
ora....SM1.asm application    ONLINE    ONLINE    ukti...b01a
ora....1A.lsnr application    ONLINE    ONLINE    ukti...b01a
ora....01a.gsd application    ONLINE    ONLINE    ukti...b01a
ora....01a.ons application    ONLINE    ONLINE    ukti...b01a
ora....01a.vip application    ONLINE    ONLINE    ukti...b01a
ora....SM2.asm application    ONLINE    ONLINE    ukti...b01b
ora....1B.lsnr application    ONLINE    ONLINE    ukti...b01b
ora....01b.gsd application    ONLINE    ONLINE    ukti...b01b
ora....01b.ons application    ONLINE    ONLINE    ukti...b01b
ora....01b.vip application    ONLINE    ONLINE    ukti...b01b


NAME=ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

NAME=ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.cs
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

Login as Service account user and add the service account user to remote desktop user Still it will show the service  status offline.

>crs_stat -t

Name           Type           Target    State     Host
------------------------------------------------------------
ora....DB2.srv application    ONLINE    ONLINE    ukti...b01a
ora....LING.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    ONLINE    ONLINE    ukti...b01a
ora....IBCO.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    OFFLINE   OFFLINE
ora....O_UK.cs application    OFFLINE   OFFLINE
ora....B1.inst application    ONLINE    ONLINE    ukti...b01a
ora....B2.inst application    ONLINE    ONLINE    ukti...b01b
ora.UKTILDB.db application    ONLINE    ONLINE    ukti...b01a
ora....SM1.asm application    ONLINE    ONLINE    ukti...b01a
ora....1A.lsnr application    ONLINE    ONLINE    ukti...b01a
ora....01a.gsd application    ONLINE    ONLINE    ukti...b01a
ora....01a.ons application    ONLINE    ONLINE    ukti...b01a
ora....01a.vip application    ONLINE    ONLINE    ukti...b01a
ora....SM2.asm application    ONLINE    ONLINE    ukti...b01b
ora....1B.lsnr application    ONLINE    ONLINE    ukti...b01b
ora....01b.gsd application    ONLINE    ONLINE    ukti...b01b
ora....01b.ons application    ONLINE    ONLINE    ukti...b01b
ora....01b.vip application    ONLINE    ONLINE    ukti...b01b


Check the permission of the user

>crs_getperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
Name: ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
owner:system:rwx,pgrp::---,other::r--,

>crs_getperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.cs
Name: ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.cs
owner:system:rwx,pgrp::---,other::r--,

Then change the permission  to the service account owner.

>crs_setperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv -o TIL\uktil02ighsorldb01


>crs_setperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.cs -o TIL\uktil02ighsorldb01

>crs_getperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
Name: ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv

>crs_getperm ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
Name: ora.UKTILDB.TIL_UKTILDB_TIBCO_UK.UKTILDB1.srv
owner:TIL\uktil02ighsorldb01:rwx,pgrp::---,other::r--,

Then start the service which was created earlier.

>srvctl start service -d UKTILDB -s TIL_UKTILDB_TIBCO_UK

>crs_stat -t

Name           Type           Target    State     Host
------------------------------------------------------------
ora....DB2.srv application    ONLINE    ONLINE    ukti...b01a
ora....LING.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    ONLINE    ONLINE    ukti...b01a
ora....IBCO.cs application    ONLINE    ONLINE    ukti...b01a
ora....DB1.srv application    ONLINE    ONLINE    ukti...b01a
ora....O_UK.cs application    ONLINE    ONLINE    ukti...b01a
ora....B1.inst application    ONLINE    ONLINE    ukti...b01a
ora....B2.inst application    ONLINE    ONLINE    ukti...b01b
ora.UKTILDB.db application    ONLINE    ONLINE    ukti...b01a
ora....SM1.asm application    ONLINE    ONLINE    ukti...b01a
ora....1A.lsnr application    ONLINE    ONLINE    ukti...b01a
ora....01a.gsd application    ONLINE    ONLINE    ukti...b01a
ora....01a.ons application    ONLINE    ONLINE    ukti...b01a
ora....01a.vip application    ONLINE    ONLINE    ukti...b01a
ora....SM2.asm application    ONLINE    ONLINE    ukti...b01b
ora....1B.lsnr application    ONLINE    ONLINE    ukti...b01b
ora....01b.gsd application    ONLINE    ONLINE    ukti...b01b
ora....01b.ons application    ONLINE    ONLINE    ukti...b01b
ora....01b.vip application    ONLINE    ONLINE    ukti...b01b

======================================================



GPNP Profile - Oracle Clusterware 11g Release 2

WHAT IS GPNP PROFILE?

The GPnP profile is a small XML file located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name profile.xml. It is used to establish the correct global personality of a node. Each node maintains a local copy of the GPnP Profile and is maintained by the GPnP Deamon (GPnPD) .

WHAT DOES GPNP PROFILE CONTAIN?

GPnP Profile  is used to store necessary information required for the startup of Oracle Clusterware like  SPFILE location,ASM DiskString  etc.

It contains various attributes defining node personality.
- Cluster name
Network classifications (Public/Private)
Storage to be used for CSS
Storage to be used for ASM : SPFILE location,ASM DiskString  etc
- Digital signature information : The profile is security sensitive. It might identify the storage to be used as the root partition of a machine.  Hence, it contains digital signature information of the provisioning authority.

Here is the GPnP profile of our RAC setup. gpnptool can be  used  for reading/editing the gpnp profile.

[root@host01 peer]# gpnptool get
<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”14cddaccc0464f92bfc703ec1004a386″ ClusterName=”cluster01″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”192.9.201.0″ Adapter=”eth0″ Use=”public”/><gpnp:Network id=”net2″ IP=”10.0.0.0″ Adapter=”eth1″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.783619911″/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>4VMorzxVNa+FeOx2SCk1unVBpfU=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>bbzV04n2zSGTtUEvqqB+pjw1vH7i8MOEUqkhXAyloX0a41T2FkDEA++ksc0BafndAk7tR+6LGdppE1aOsaJUtYxQqaHJdpVsJF+sj2jN7LPJlT5NBt+K7b08TLjDID92Se6vEiDAeeKlEbpVWKMUIvQvp6LrYK8cDB/YjUnXuGU=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

WHO UPDATES GPNP PROFILE?
GPnPd daemon replicates changes to the profile during
  • - installation,
  • - system boot or
  • - when updated
Profile is updated Whenever changes are made to a cluster with configuration tools like
  • . oifcfg (Change network),
  • . crsctl (change location of voting disk),
  • . asmcmd (change ASM_DISKSTRING, SPfile location) etc.
HOW IS GPNP PROFILE USED BY CLUSTERWARE?

To start clusterware, voting disk needs to be accessed. If voting disk is on ASM, this information (that voting disk is on ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>).   The voting disk is read using kfed utility  even if ASM is not up.
Next,  the clusterware checks if all the nodes have the updated GPnP profile and the node joins the cluster based on the GPnP configuration . Whenver a node is started/added to the cluster, the clusterware software on the starting node starts a GPnP agent.
  • - If the node is already part of the cluster, the GPnP agent reads the existing profile on that node.
  • - If the node is being added to the cluster, GPnP agent locates agent on another existing node using multicast protocol (provided by mDNS) and gets the profile from that agent.
Next CRSD needs to read OCR to startup various resources on the node and hence update it as status of resources changes. Since OCR is also on ASM, location of ASM SPfile should be known.
The order of searching the ASM SPfile is
  • - GPnP profile
  • - ORACLE_HOME/dbs/spfile<sid.ora>
  • - ORACLE_HOME/dbs/init<sid.ora>
In cluster environment, the location of  SPfile for ASMread from GPnP profile.

[grid@host01 peer]$ gpnptool getpval -asm_spf
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-
+DATA/cluster01/asmparameterfile/registry.253.793721441
The oputput of the query shows that SPfile is on ASM in DATA diskgroup. 

To find out the location of ASM disks, following query is issued :
[root@host01 peer]# gpnptool getpval -asm_dis
ASM-Profile id=”asm” DiscoveryString=””
The  device headers of every device in the disk string returned by the above query are scanned  (if configured by you at ASM initial setup time). Here Discovery String is blank is as ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .
Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)

[root@host01 ~]#  kfed read /dev/sdb3 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       16 ; 0x0f4: 0x00000010
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
In the output above, we see that
     the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).
     The ASM spfile location starts at the disk offset of 16 (spfile=16)

Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:
[root@host01 ~]#  dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16  bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s
[root@host01 ~]# strings spfileASM_Copy2.ora
+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.asm_diskgroups=’FRA’#Manual Mount
+ASM2.asm_diskgroups=’FRA’#Manual Mount
+ASM1.asm_diskgroups=’FRA’#Manual Mount
*.asm_power_limit=1
*.diagnostic_dest=’/u01/app/grid’
*.instance_type=’asm’
*.large_pool_size=12M
*.remote_login_passwordfile=’EXCLUSIVE’
Using the parameters in SPfile, ASM is started.
Once ASM is up, OCR is read by CRSD and various resources on the node are started.
Each node reads network information in GPnP profile and using GNS,  negotiates appropriate network identity for itself . Hence, nodes can be dynamically added/deleted.
What happens if GPnP profile is lost?
To know please click  here.
————————————————————
GPNPTOOL COMMAND REFERENCE:

How to read the profile
[root@inssc3 bin]# ./gpnptool get

How to find GPnP Deamons are running on the local node
[root@host01 peer]# gpnptool lfind
Success. Local gpnpd found.

How to find the location of ASM spfile if the ASM is down
[root@host01 peer]# gpnptool getpval -asm_spf
+DATA/cluster01/asmparameterfile/registry.253.783619911

How to find all RD-discoverable resources of given type
[root@host01 peer]# gpnptool find
Found 3 instances of service ‘gpnp’.
        mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01
        mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01
        mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01

log file sync

Oracle foreground processes are waiting
for a COMMIT or ROLLBACK to complete


Tune LGWR to get good throughput to
 disk eg: Do not put redo logs on
RAID5

Reduce overall number of commits by
batching transactions so that there
are fewer distinct COMMIT operations

Reference Note# 34592.1

High Waits on log file sync Note# 125269.1

Tuning the Redolog Buffer Cache and Resolving Redo Latch Contention
Note# 147471.1

How to work when we get an archive file system alert?

  When we receive an OEM archive file system alert how do we check & resolve.   
   
  • If this is STANDALONE database
    • Check the Archivelog_log output, look for BACKUP_COUNT =2 & those are the files can be removed safely from DISK using the below RMAN command
      1. RMAN> delete archivelog all backed up 2 times to device type 'SBT_TAPE' completed before  'sysdate-7';
      2. sysdate –DAY can be modified to 5/6 based on the situation if we are running out of space
    • Check the previous day archivelog backup (some DB this is scheduled as a separate script & in some it is part of complete backup) is successful from /nsr/applogs
    • If this has not run on the scheduled time or failed call TB networker oncall on +91 7760976961 & ask them to kick off the archivelog backup alone.

  • If this is PRIMARY database
    • Check the Archivelog_log output, look for BACKUP_COUNT =2 of primary archive file & local remote sequence it should say APPLIED=YES & those are the files can be removed safely from DISK using below RMAN command
      1. RMAN> delete archivelog all backed up 2 times to device type 'SBT_TAPE' completed before  'sysdate-7';
      2. sysdate –DAY can be modified to 5/6 based on the situation if we are running out of space
    • Check the previous day archivelog backup (some DB this is scheduled as a separate script & in some it is part of complete backup) is successful from /nsr/applogs
    • If this has not run on the scheduled time or failed call TB networker oncall on +91 7760976961 & ask them to kick off the archivelog backup alone.

  • If this is STANDBY database
    • Check the Archivelog_log output, look for APPLIED=YES & those are the files can be removed safely from DISK using below RMAN command
      1. RMAN> delete archivelog all completed before  'sysdate-7';
      2. sysdate –DAY can be modified to 5/6 based on the situation if we are running out of space