Tuesday, May 12, 2015

Performing Block Media Recovery

While running SQL statements on the database, suddenly SQLs may start failing with error messages showing that there is database block corruption in some datafile(s). Error messages in the Alert Log could be similar to the following.
ORA-01578: ORACLE data block corrupted (file # 4, block # 36321)
ORA-01110: data file 4: '

In case of physical corruption, error message could be as follows where checksome of block is ca
computed block checksum: 0x50
Reread of blocknum=86032, file= G:\ORACLE\ORADATA\TESTDB\MYDATA02.DBF. found same corrupt data
Hex dump of (file 5, block 86032) in trace file d:\oracle\admin\testdb\bdump\testdb_j006_7340.trc
Corrupt block relative dba: 0x01415010 (file 4, block 86032)
Bad header found during buffer read
Data in bad block:
 type: 1 format: 2 rdba: 0x000051c0
 last change scn: 0x81b4.00001add seq: 0x1a flg: 0x96
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x68b50601
 check value in block header: 0x4
 computed block checksum: 0x50
Reread of rdba: 0x01415010 (file 5, block 86032) found same corrupted data
Thu May 07 07:08:36 2015
Corrupt Block Found
         TSN = 4, TSNAME = MYDATA
         RFN = 5, BLK = 86032, RDBA = 21057552
         OBJN = 125890, OBJD = 125890, OBJECT = FINANCIAL_TABLE, SUBOBJECT =
         SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment

Starting 11g R2, Oracle can perform block recovery automatically if active dataguard is configured in the environment. In this case, if primary database encounters block corruption, it will automatically try recover the block(s) by requesting block(s) from the physical standby database. If block corruption is found in the physical standby database, it can request the correct block from primary database to perform the recovery.

Oracle provides a manual method to perform recovery at block level rather than restoring and recovering the whole datafile(s) where block corruption has happened. There are 2 types of block corruption, logical block corruption and physical block corruption. Method/command to perform recovery in either case is same, but we need to have a valid RMAN backup to perform data block recovery.

To check the corruption, we use BACKUP VALIDATE command from RMAN prompt. If CHECK LOGICAL option is not used in VALIDATE command; it does not check logical corruption, but checks only physical corruption, and using CHECK LOGICAL would check for both physical and logical corruption. In the following example I am trying to check both types of corruption after I saw “block checksum” error messages in alert log file of my database for datafile number 4.
RMAN> backup validate check logical database;
Starting backup at 23-APR-15
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=526 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00021 name=G:\ORACLE\ORADATA\TESTDB\SYSTEM01.DBF
input datafile fno=00072 name=G:\ORACLE\ORADATA\TESTDB\UNDOTBS01.DBF
input datafile fno=00074 name=G:\ORACLE\ORADATA\TESTDB\USERS.DBF
input datafile fno=00076 name=G:\ORACLE\ORADATA\TESTDB\SYSAUX01.DBF
input datafile fno=00078 name=G:\ORACLE\ORADATA\TESTDB\MYDATA01.DBF
input datafile fno=00080 name=G:\ORACLE\ORADATA TESTDB\MYDATA02.DBF
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:47
Finished backup at 23-APR-15


Next step is to query v$database_block_corruption view which would be populated by the VALIDATE command with the details of the corrupted blocks.
SQL> select * from v$database_block_corruption;

---------- ---------- ---------- ------------------ ---------------------------------------------------
        4          86032          1                               0                                    CHECKSUM
         1      12281             1                               0                                    FRACTURED
         1        665               1                               0                                    FRACTURED
         2       9202              1               303801173                                    CORRUPT
         2      24605             1                               0                                    FRACTURED
         2      46357             1                               0                                    FRACTURED
         2      50364             1               303801434                                    CORRUPT
         2      50380             1               303801434                                    CORRUPT

         2      50885             1                               0                                    FRACTURED

Now here is a tricky part: I have experienced that sometimes VALIDATE command won’t show the physical corruption – db verify command (dbv) comes handy in this situation which is a special tool to check physical corruption in the datafiles.

C:\Documents and Settings\Administrator>dbv file= G:\ORACLE\ORADATA\TESTDB\MYDATA02.DBF

DBVERIFY: Release - Production on Thu Apr 23 11:38:46 2015

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Page 86032 is marked corrupt
Corrupt block relative dba: 0x01415010 (file 5, block 86032)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 2 rdba: 0x000051c0
 last change scn: 0x81b4.00001add seq: 0x1a flg: 0x96
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x68b50601
 check value in block header: 0x4
 computed block checksum: 0x50

DBVERIFY - Verification complete

Total Pages Examined         : 262144
Total Pages Processed (Data) : 121366
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 4
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 435
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 140338
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 1322968318 (0.1322968318)

Performing Recovery
Following example shows how we perform a datablock recovery if backup is on the tape. If backups are on the disk, just allocate channel for the disk and rest of the method is same. 
If there is a long list of corrupted blocks, all can be recovered in a single go by using command "RECOVER CORRUPTION LIST", instead of following "BOCK RECOVER" command.
RMAN> run{
2> allocate channel ch0 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=C:\Progra~1\tivoli\tsm\agentoba64\tdpo.op
4> }

released channel: ORA_DISK_1
allocated channel: ch0
channel ch0: sid=526 devtype=SBT_TAPE
channel ch0: Data Protection for Oracle: version

Starting blockrecover at 23-APR-15

channel ch0: restoring block(s)
channel ch0: specifying block(s) to restore from backup set
restoring blocks of datafile 00103
channel ch0: reading from backup piece BKP_TESTDB_824020284_58988_1
channel ch0: restored block(s) from backup piece 1
piece handle=BKP_TESTDB_824020284_58988_1 tag=TAG20130821T060022
channel ch0: block restore complete, elapsed time: 00:05:46

starting media recovery
media recovery complete, elapsed time: 00:01:25

Finished blockrecover at 23-APR-15
released channel: ch0

No comments: