Thursday, November 26, 2015

ORA-00354: corrupt redo log block header

If you see ORA-00354 in your alert log file, as can be seen bellow
Tue Nov 24 14:36:25 2015
Thread 1 cannot allocate new log, sequence 16492
Private strand flush not complete
  Current log# 9 seq# 16491 mem# 0: D:\ORACLE\ORADATA\MYDB\REDO9_01.RDO
Thread 1 advanced to log sequence 16492
  Current log# 6 seq# 16492 mem# 0: D:\ORACLE\ORADATA\MYDB\REDO6_01.RDO
Tue Nov 24 14:36:27 2015
ARCH: Archival stopped, error occurred. Will continue retrying
Tue Nov 24 14:36:27 2015
Errors in file d:\oracle\admin\MYDB\bdump\MYDB_arc0_3148.trc:
ORA-16014: log 3 sequence# 16454 not archived, no available destinations
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'

Tue Nov 24 14:37:30 2015
Thread 1 cannot allocate new log, sequence 16493
Private strand flush not complete
  Current log# 6 seq# 16492 mem# 0: D:\ORACLE\ORADATA\MYDB\REDO6_01.RDO
Thread 1 advanced to log sequence 16493
  Current log# 4 seq# 16493 mem# 0: D:\ORACLE\ORADATA\MYDB\REDO4_01.RDO
Tue Nov 24 14:37:55 2015
ARC1: Log corruption near block 132988 change 521383134 time ?
Tue Nov 24 14:37:55 2015
Errors in file d:\oracle\admin\MYDB\bdump\MYDB_arc1_6612.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 132988 change 521383134 time 11/17/2015 04:06:08
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'

ARC1: All Archive destinations made inactive due to error 354
ARCH: Archival stopped, error occurred. Will continue retrying
Tue Nov 24 14:37:55 2015
Errors in file d:\oracle\admin\MYDB\bdump\MYDB_arc1_6612.trc:
ORA-16038: log 3 sequence# 16454 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'

Trace file as highlighted above might look similar to the following.
kcrrfail: dest:1 err:354 force:0 blast:1
*** 2015-11-24 14:33:55.237 20216 kcrr.c
ORA-16038: log 3 sequence# 16454 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:35:55.237
*** 2015-11-24 14:35:55.237 60653 kcrr.c
ARC1: Log corruption near block 132988 change 521383134 time ?
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 132988 change 521383134 time 11/17/2015 04:06:08
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:35:55.237 60653 kcrr.c
ARC1: All Archive destinations made inactive due to error 354
*** 2015-11-24 14:35:55.237 58915 kcrr.c
kcrrfail: dest:1 err:354 force:0 blast:1
*** 2015-11-24 14:35:55.253 20216 kcrr.c
ORA-16038: log 3 sequence# 16454 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:36:55.128
*** 2015-11-24 14:36:55.128 20216 kcrr.c
*** 2015-11-24 14:37:55.253
*** 2015-11-24 14:37:55.253 60653 kcrr.c
ARC1: Log corruption near block 132988 change 521383134 time ?
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 132988 change 521383134 time 11/17/2015 04:06:08
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:37:55.253 60653 kcrr.c
ARC1: All Archive destinations made inactive due to error 354
*** 2015-11-24 14:37:55.253 58915 kcrr.c
kcrrfail: dest:1 err:354 force:0 blast:1
*** 2015-11-24 14:37:55.268 20216 kcrr.c
ORA-16038: log 3 sequence# 16454 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:39:55.268
*** 2015-11-24 14:39:55.268 60653 kcrr.c
ARC1: Log corruption near block 132988 change 521383134 time ?
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 132988 change 521383134 time 11/17/2015 04:06:08
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'
*** 2015-11-24 14:39:55.268 60653 kcrr.c
ARC1: All Archive destinations made inactive due to error 354
*** 2015-11-24 14:39:55.268 58915 kcrr.c
kcrrfail: dest:1 err:354 force:0 blast:1
*** 2015-11-24 14:39:55.284 20216 kcrr.c
ORA-16038: log 3 sequence# 16454 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\MYDB\REDO3_01.RDO'

It clearly means that there is corruption in the redo log file belonging to group 3. Although current log sequence number is 16492, but sequence number 16454 has not been archived because only member of log group 3 is corrupt and hence cannot be archived. Following query also confirms that sequence 16454 has not been archived.
SQL> SELECT sequence#, name FROM V$ARCHIVED_LOG ORDER BY 1;
SEQUENCE# NAME
---------- ------------------------------------------------------------------------------
     16452 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164520609542238.ARC
     16453 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164530609542238.ARC
--MISSING SEQUENCE 16445
     16455 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164550609542238.ARC
     16456 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164560609542238.ARC
     16482 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164820609542238.ARC
     16483 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164830609542238.ARC
     16484 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164840609542238.ARC
     16485 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164850609542238.ARC
     16486 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164860609542238.ARC
     16487 D:\ORACLE\ADMIN\MYDB\ARCHIVE\MYDBT001S164870609542238.ARC


Cause
This issue could be caused by storage or hardware issue causing redo logs to be corrupted. have a thorough check of the hardware and storage. OS logs may or may not show issues with the hardware or storage for this scenario.

Solution
You should always have multiple redo log member in each redo log group so that if one logfile member is corrupt, other member could be used to archive the log data. For this scenario since there is only one member, the solution is to clear this redo log file group so that it can be re-used again.
SQL> alter database clear unarchived logfile group 2;

Database altered.

Important Note: Take full database backup immediately after clearing the redo log group because archived log for sequence 16454 is not available because of redo log corruption and recovery would not be possible if a medial failure occurs at this point.

Sunday, November 22, 2015

RMAN-00600: internal error, arguments [9222] [] [] [] []

I faced following error message while creating a standby database from live database using RMAN DUPLICATE command.

C:\Users\administrator>rman target sys/pass11g@pri auxiliary sys/pass11g@aux

Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 9 12:35:18 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=3343192461)
connected to auxiliary database: STBY (not mounted)

RMAN> run{
2>  allocate channel c1 type disk;
3>  allocate auxiliary channel standby-db type disk;
4>  duplicate target database for standby from active database;
5>  }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=695 instance=prod1 device type=DISK

allocated channel: standby-db
channel standby-db: SID=465 instance=prod2 device type=DISK

Starting Duplicate Db at 09-SEP-15

contents of Memory Script:
{
   backup as copy reuse
   ;
released channel: c1
released channel: stby
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2015 12:35:46
RMAN-05501: aborting duplication of target database
RMAN-00600: internal error, arguments [9222] [] [] [] []
RMAN-01009: syntax error: found ";": expecting one of: "archivelog, as, auxiliary, backupset, backup, channel, check, controlfilecopy, copies, copy, cumulative, current, database, datafilecopy, datafile, db_file_name_convert, db_recovery_file_dest, device, diskratio, duration, filesperset, force, f
RMAN-01007: at line 3 column 4 file: Memory Script

Solution
For standby database to work properly and connect with the primary (and primary to connect with the standby), password file of primary needs to be copied to the standby database host. In my case, I forgot to do this before executing DUPLICATE command “FROM ACTIVE DATABASE”. After copying password file, my duplicate command executed successfully.



Sunday, November 15, 2015

RMAN-05001 auxiliary filename string conflicts with a file used by the target database

If this error message is being returned during RMANDUPLICATE DATABASE command, it is because RMAN would try to created destination files at the same location where source database files exist; and since it cannot overwrite existing files, so it and returns error message. To avoid this error, NOFILENAME check option should be used with DUPLICATE command. Otherwise use one of the following to specify different file names
  • Use DB_FILE_NAME_CONVERT init parameter
  • Use DB_FILE_NAME_CONVERT option in DUPLICATE command
  • SET NEWNAME  for each datafile in the RMAN run block with DUPLICATE command



Wednesday, November 11, 2015

Heartbeat failed to connect to standby

If Primary database alert log is showing error message similar to the following,
PING[ARC1]: Heartbeat failed to connect to standby 'standby-db'. Error is 16058.
it means that primary database is not able to ship redo log data to standby database because standby database instance is not mounted; which is required for shipping the redo log data to standby site.
To avoid this error and keep standby in-sync with primary, make sure that standby database instance is in mount state and is accessible from the primary database.

Thursday, November 5, 2015

Terminating the instance due to error 481

If you are trying to start a RAC instance and it is terminating immediately after STARTUP command, and alert log is showing following messages
MMNL started with pid=26, OS id=7068
NOTE: initiating MARK startup
Starting background process MARK
Wed Sep 09 15:27:46 2015
MARK started with pid=27, OS id=7624
NOTE: MARK has subscribed
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Warning: Shared server clients will not be able to connect because SHARED_SERVERS is 0
lmon registered with NM - instance number 1 (internal mem no 0)
LMON (ospid: 7920): terminating the instance due to error 481
Instance terminated by LMON, pid = 7920

According to my experience (there could be other reasons which I am not aware of), it actually means that you have set CLUSTER_DATABASE parameter value to FALSE, and other instance(s) of this RAC database is/are already started. Since CLUSTER_DATABASE is set to FALSE, this instance would not startup.
To start this instance successfully, set parameter CLUSTER_DATABASE to TRUE.
CLUSTER_DATABASE=TRUE