Thursday, May 25, 2017

Configuring Oracle ASM Filter Driver (ASMFD) and Labeling Disks


If you already have 12.2.0.1 and above GI/RAC installation and you want to migrate to ASMFD, click here.

Starting 12.2.0.1, downloaded Grid Infrastructure (GI) software zip is extracted directly in to the directly we have already chosen as ORACLE_HOME for GI. After extraction, we execute $ORACLE_HOME/gridSetup.sh script to initiate the setup wizard. If we want to use ASMFD (ASM Filter Driver), then after we have provisioned the disk/partition at OS level, we can configure ASMFD as follows. We no longer need ASMLib to setup ASM disks once we start using ASMFD.

For this example, I already have created a disk partition /dev/sdb1 which I will be labelling using ASMFD during the configuration of GI. As root user, configure the ASMFD as follows. /u01/app/12.2.0/grid is where I have extracted the GI software.
[root@salman11 ~]# export ORACLE_HOME=/u01/app/12.2.0/grid
[root@salman11 ~]# export ORACLE_BASE=/tmp

[root@salman11 ~]# $ORACLE_HOME/bin/asmcmd afd_label DATA /dev/sdb1 --init
[root@salman11 ~]# $ORACLE_HOME/bin/asmcmd afd_lslbl /dev/sdb1
--------------------------------------------------------------------------------
Label                     Duplicate  Path
=====================================================================
DATA                                  /dev/sdb1

[root@salman11 ~]# unset ORACLE_BASE


Now during GI configuration setup (initiate $ORACLE_HOME/gridSetup.sh), you will see following screen where this new disk will be listed. Select this disk and click the checkbox “Configure Oracle ASM Filter Driver”. ASM Filter Driver will be configured automatically during the setup and you can start using ASMFD for your future ASM disks configurations. As stated above, now you are no longer required to use ASMLib.

Thursday, May 18, 2017

Disabling Transparent Huge Pages on Linux 6 and Linux 7

Oracle recommends that Transparent Huge Pages should be disabled before installation of Oracle software. In the following I will explain how to disable Transparent Huge Pages on Linux 6 and Linux 7. I am using Oracle Enterprise Linux where I see that this feature is already disabled. You can check for your release of Linux whether it is disabled or not, and disable it using method explained here. This is the official document for doing this task.

Check if Transparent Huge Pages is Enabled
For Red Hat Linux, issue following command
# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled

For other Linux distributions, use issue following command
# cat /sys/kernel/mm/transparent_hugepage/enabled

For Red hat, if you don’t see directory “redhat_transparent_hugepage”, and for other distributions, directory “transparent_hugepage” does not exist, it means Transparent Huge Pages is already disabled
If Transparent Huge Pages is enabled, above command should return following output.
[always] never

Disabling Transparent Huge Pages 
Add following entry in /etc/grub.conf file for Linux 6, and /etc/grub2.cfg for Linux 7. This entry needs to be made for every kernel listed in the grub configuration file which you use for booting the system.
transparent_hugepage=never

For example, on my current Oracle Linux 7, I have 3 kernels that I can use to boot my OS, and Transparent huge Pages is already disabled, but you will need to make this entry if entry is not already there. Following are these entries from my server.

menuentry 'Oracle Linux Server, with Linux 3.10.0-123.el7.x86_64' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnul
inux-3.10.0-123.el7.x86_64-advanced-c0c94d8d-4dac-4926-89a1-12f5ae2659f5' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  9ca469b8-017b-498
e-a5af-5b0378ee8b75
        else
          search --no-floppy --fs-uuid --set=root 9ca469b8-017b-498e-a5af-5b0378ee8b75
        fi
        linux16 /vmlinuz-3.10.0-123.el7.x86_64 root=UUID=c0c94d8d-4dac-4926-89a1-12f5ae2659f5 ro vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrheb-sun1
6 rhgb quiet numa=off transparent_hugepage=never
        initrd16 /initramfs-3.10.0-123.el7.x86_64.img
}


menuentry 'Oracle Linux Server, with Unbreakable Enterprise Kernel 3.8.13-35.3.1.el7uek.x86_64' --class oracle --class gnu-linux --class gnu --class os --unrestric
ted $menuentry_id_option 'gnulinux-3.8.13-35.3.1.el7uek.x86_64-advanced-c0c94d8d-4dac-4926-89a1-12f5ae2659f5' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  9ca469b8-017b-498
e-a5af-5b0378ee8b75
        else
          search --no-floppy --fs-uuid --set=root 9ca469b8-017b-498e-a5af-5b0378ee8b75
        fi
        linux16 /vmlinuz-3.8.13-35.3.1.el7uek.x86_64 root=UUID=c0c94d8d-4dac-4926-89a1-12f5ae2659f5 ro vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrhe
b-sun16 rhgb quiet numa=off transparent_hugepage=never
        initrd16 /initramfs-3.8.13-35.3.1.el7uek.x86_64.img
}


menuentry 'Oracle Linux Server, with Linux 0-rescue-f58182a4af154aa38f194b45455927b1' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menue
ntry_id_option 'gnulinux-0-rescue-f58182a4af154aa38f194b45455927b1-advanced-c0c94d8d-4dac-4926-89a1-12f5ae2659f5' {
        load_video
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  9ca469b8-017b-498
e-a5af-5b0378ee8b75
        else
          search --no-floppy --fs-uuid --set=root 9ca469b8-017b-498e-a5af-5b0378ee8b75
        fi
        linux16 /vmlinuz-0-rescue-f58182a4af154aa38f194b45455927b1 root=UUID=c0c94d8d-4dac-4926-89a1-12f5ae2659f5 ro vconsole.keymap=us crashkernel=auto vconsole.f
ont=latarcyrheb-sun16 rhgb quiet numa=off transparent_hugepage=never
        initrd16 /initramfs-0-rescue-f58182a4af154aa38f194b45455927b1.img
}


Reboot the server so that changes become effective.

Sunday, May 14, 2017

Fatal NI connect error 12537

This error may appear in the alert log file. Following is the snippet from the alert log file.
Fatal NI connect error 12537, connecting to:
 (LOCAL=NO)

  VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
  Time: 15-MAY-2017 09:52:12
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12537
   
TNS-12537: TNS:connection closed
    ns secondary err code: 12560
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
opiodr aborting process unknown ospid (24830) as a result of ORA-609
Mon May 15 10:19:04 2017
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Mon May 15 11:27:57 2017
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Mon May 15 11:51:06 2017


***********************************************************************

Fatal NI connect error 12537, connecting to:
 (LOCAL=NO)

  VERSION INFORMATION:
            TNS for Linux: Version 11.2.0.4.0 - Production
            Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
            TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
  Time: 15-MAY-2017 11:51:06
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12537
   
TNS-12537: TNS:connection closed
    ns secondary err code: 12560
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
opiodr aborting process unknown ospid (24273) as a result of ORA-609

This is because of connection closed by the database server because client could not complete the request of session and database server dropped the connection/session. This happens during initial phase of session when client and server processes are communicating and shaking hands after authentication of the client. There can be several reasons for this. One reason could be that client itself aborted (i.e. client rebooted, or network disconnection), another reason could be that database server was too busy and could not complete connection request timely and client disconnected before session could be established. There could be several of other reasons.
Adding parameter SQLNET.INBOUND_CONNECT_TIMEOUT in sqlnet.ora file and INBOUND_CONNECT_TIMEOUT_listener_name in listener.ora file can help us eradicating these errors. Values of these parameters are in number of seconds. Suppose if we set the values to 60, it would mean that database/listener would wait for 60 seconds time during which client can be authenticated before dropping the session request and logging error in the alert log file (as shown above). After setting these parameters, reload the listener (lsnrctl reload <listener_name>).
If adding these parameters does not solve your problem, you may seek help of oracle support by submitting trace files (after enabling sql trace).
These errors could also be suppressed by adding following parameter in the sqlnet.ora file

DIAG_ADR_ENABLED = OFF

--Reload the listener after adding above line.
lsnrctl reload <listener_name>
 Now these errors should not appear in alert log file, and should go to sqlnet.log instead that can be further used for investigating about these errors.

Tuesday, May 9, 2017

Error 1017 received logging on to the standby

Most probable reason for this error appearing in the alert log file of standby database is that primary database is not able to connect to the standby database to ship redo log data because the password for the SYS user in the standby database is not in sync with the primary database. Standby database is also not able to fetch the archive gap because of same discrepancy. Full error stack in the alert log file would look similar to the following.
Media Recovery Log /fra/mydb/archive/MYDB0001_0000010397_0893868278.ARC
Media Recovery Waiting for thread 1 sequence 10398
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------
FAL[client, USER]: Error 16191 connecting to MY_DB_HOST for fetching gap sequence

As a standard procedure, whenever you change SYS password in the standby database, you should always copy password file from primary database to the standby database(s). In case of RAC, copy password file to all standby RAC nodes. Since password for SYS user (and other SYSDBAs) are stored in the password file, changing password in primary would not replicate password change to the standby database along with other changes made within the database and therefore password file needs to be copied manually.
To solve this problem (and also to avoid in future), always remember to copy the passwordfile from primary database to the standby database after you perform a password change activity for SYS user in the primary database.

Password file difference may also cause to appear in the alert log file as explained here.

Friday, May 5, 2017

File #... added to control file as 'UNNAMED000...'

On your standby database if you see a datafile with the name similar to ‘UNNAMED000’, it means that you have STANDBY_FILE_MANAGEMENT parameter set to auto and managed recovery process tried to create a new datafile (after you already added this file on primary database), but it could not create file because of some reason. I faced this error because my ASM diskgroup was full. Following were the entries in my alert log file of my standby database after adding a datafile in my primary database using command ‘ALTER TABLESPACE my_tablespace ADD DATAFILE ‘+DATA/datafile06.dbf’ SIZE 20g’
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Errors in file /u01/app/oracle/diag/rdbms/mydb/mydb/trace/mydb_mrp0_12030.trc:
ORA-01119: error in creating database file '+DATA/datafile06.dbf'
ORA-17502: ksfdcre:4 Failed to create file +DATA/datafile06.dbf'
ORA-15041: diskgroup "DATA" space exhausted
File #19 added to control file as 'UNNAMED00019'.
Originally created as:
'+DATA/datafile06.dbf'
Recovery was unable to create the file as:
'+DATA/datafile06.dbf'
Errors with log /arch/mydb/archive/MYDB0001_0000004740_0874013687.ARC
MRP0: Background Media Recovery terminated with error 1274
Errors in file /u01/app/oracle/diag/rdbms/mydb/mydb/trace/mydb_mrp0_12030.trc:
ORA-01274: cannot add datafile '+DATA/datafile01.dbf' - file could not be created

MRP (media recovery process) also stopped after this error as recovery could not continue after this error. Trying to start media recovery would return following error
Errors in file /u01/app/oracle/diag/rdbms/mydb/mydb/trace/mydb_dbw0_11195.trc:
ORA-01186: file 19 failed verification tests
ORA-01157: cannot identify/lock data file 19 - see DBWR trace file
ORA-01111: name for data file 19 is unknown - rename to correct file
ORA-01110: data file 19: '/u01/app/oracle/product/11203/db1/dbs/UNNAMED00019'
File 19 not verified due to error ORA-01157
Media Recovery failed with error 1111

V$datafile view in standby database showed file name as follows
SQL> select name from v$datafile';

NAME
--------------------------------------------------------------------------------



/u01/app/oracle/product/11203/db1/dbs/UNNAMED00019

19 rows selected.


There may be different reasons for file not being created in standby database and to solve this issue, you would need to find out what caused this datafile creation operation failure and resolve the issue first. As explained above that in my scenario my ASM diskgroup was left with less than 20G space that caused this failure. To resolve this issue, I added space in my ASM diskgroup so that I have enough space for adding the new datafile, and then executed following steps.

--Set standby_file_management to manual
SQL> alter system set standby_file_management=MANUAL scope=memory;

System altered

-- Re-Create affected datafile as follows
SQL> alter database create datafile '/u01/app/oracle/product/11203/db1/dbs/UNNAMED00019' as '+DATA/datafile06.dbf';

Database altered.

--Set standby_file_management back to auto
SQL> alter system set standby_file_management=auto scope=memory;

System altered

-- Now we can see our datafile listed under v$datafile
SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------



+DATA/datafile06.dbf

19 rows selected.

SQL>

-- Start Managed Recovery
SQL> alter database recover managed standby database using current logfile disconnect;