Monday, September 22, 2014

Issues with Bind Variables and Bind Peeking

I would explain bellow that how bind peeking played a disastrous role  in one of my database. For usage of bind variables, a good post by Arup Nanda is available here

From one of my OLTP databases (10.2.0.4), suddenly I started receiving OEM alerts for “run queue length” as follows. (My server has total 24 CPUs)

------------------------------------------------
Target Name=<TARGET_NAME>
Target type=Host
Host=<HOST_NAME>
Occurred At=11-Sep-2014 06:04:13 SGT
Message=CPU Load is 27.08, crossed warning (18) or critical (24) threshold.
Metric=Run Queue Length (5 minute average)
Metric value=27.08
Severity=Critical
Acknowledged=No
Notification Rule Name=Host Availability and Critical States
Notification Rule Owner=SYSMAN
------------------------------------------------

While further investigating the issue, I found that most of the time was consumed by a query which was part of every transaction (at the particular time) executed by each session. This meant that each session was executing this query and hence overall most of the resources consumption was only by this single query.  In OEM (or you can query v$sql_plan using SQL_ID), I could see multiple plans listed for this SQL and I suspected that perhaps this current plan caused this query to take longer time to finish, and since each session had to execute this query, session would not return until this query finishes. Since each session was executing this SQL, the session was not leaving the CPU and new sessions were waiting for the CPU to be released and eventually raising “Run Queue Length”.

Since my SQL was using bind variables, immediate thing which came into my mind was that probably this SQL was aged out of shared pool and when it was reloaded, it loaded a new execution plan (the bad one) into the shared pool (plan could change because of bind peeking). So, I issued “alter system flush shared pool” (not a good idea actually to invalidate all cursors), to invalidate the cursors in the shared pool and let it reload all cursors again (including this SQL) and saw if it load a good plan this time. This thing worked; as I saw a new plan for this SQL and “run queue length” immediately came down to less than 10.

From dba_hist_sql_plan, I found that there were total 6 different execution plans for this SQL and then I used awrsqrpt.sql script to fetch the execution details for this SQL against each of this plan from AWR repository, and following was the interesting detail which I extracted from SQL AWR reports

Plan with hash value 465940006, Average Physical Reads per execution => 319
Plan with hash value 1122690139, Average Physical Reads per execution => 36
Plan with hash value 976138646, Average Physical Reads per execution => 1285
Plan with hash value 2677530476, Average Physical Reads per execution => 144
Plan with hash value 3841102669, Average Physical Reads per execution => 37
Plan with hash value 2920758856, Average Physical Reads per execution => 155

You can see that plan 976138646 is doing a lot of physical reads and this was actually the culprit at this time as it was slowing down the execution of this particular SQL. After I flushed the shared pool, this plan (976138646) was gone and next plan which was loaded was 1122690139 which made the things better for me. The other better plan here is 3841102669 which was also in use in the past.

Solution for the problem
Invalidating the cursor can help to reload the new and better plan, but this is not the permanent solution. Starting 11g, adaptive cursor sharing feature is very helpful as Oracle may select a different execution plan based on the values in bind variables during different executions of same SQL.

In 10g, oracle profiles and starting from 11g, sql plan management features could be used to attach the plans with particular SQLs to avoid this kind of situation.

Tuesday, September 9, 2014

Privileges Analysis in 12c

In Oracle 12c, we can do an analysis of privileges used by the users. This analysis can actually help to know if there are some privileges granted to some user which were never used by that user and hence can be revoked. This can be even more useful in a situation where DBA realizes that someone (probably previous DBA) has granted a lot of privileges to some users which are apparently not required but may be a threat to the security of the data and he can do a privilege analysis to decide which privileges were never used and can be revoked.

There are three steps involved in privilege analysis
Create capture (DBMS_PRIVILEGE_CAPTURE.CREATE_CAPTURE)
Enable capture (DBMS_PRIVILEGE_CAPTURE.ENABLE_CAPTURE)
Disable capture (DBMS_PRIVILEGE_CAPTURE.DISABLE_CAPTURE)
Generate result (DBMS_PRIVILEGE_CAPTURE.GENERATE_RESULT)

Following is the list of data dictionary views which would contain the analysis information
DBA_USED_PUBPRIVS
DBA_USED_OBJPRIVS
DBA_USED_SYSPRIVS
DBA_USED_PRIVS
DBA_USED_OBJPRIVS_PATH
DBA_USED_SYSPRIVS_PATH
DBA_UNUSED_OBJPRIVS
DBA_UNUSED_SYSPRIVS
DBA_UNUSED_PRIVS
DBA_UNUSED_OBJPRIVS_PATH
DBA_UNUSED_SYSPRIVS_PATH

Privileges analysis can be done at 1) Database Level, 2) Role level, 3) Context level and 4) Role + Context level.

Here I will be doing Context level (Based on logged in USER) privileges analyses to check the privileges usage by a single database user.

Creating a new user (in a PDB) and grant privileges

###########################################
SQL> create user salman identified by salman;
User created.

SQL> alter user salman default tablespace users quota unlimited on users;
User altered

SQL> grant connect to salman;

Grant succeeded.
SQL> grant resource to salman;

Grant succeeded.
SQL> grant create table to salman;

Grant succeeded.
SQL> grant select any table to salman;

Grant succeeded.
 ###########################################

Create capture
exec dbms_privilege_capture.create_capture(name=>'CHECK_PRIVILEGES', Description=>' Capture privileges for user SALMAN', type=>dbms_privilege_capture.g_context, condition=> 'SYS_CONTEXT(''USERENV'',''SESSION_USER'')=''SALMAN''');

Database level capture can be created as follows.
exec dbms_privilege_capture.create_capture(name=>'CHECK_PRIVILEGES', Description=>' Capture privileges for whole database', type=>dbms_privilege_capture.g_database);

For capture based on Module from the session context, following method would be used
Exec SYS.DBMS_PRIVILEGES_CAPTURE(- NAME=>’Privs_AcctPayable_capture’, - Description=> ‘All privileges used by module’, - Type=> ‘SYS_CONTEXT(“USERENV”, “MODULE”)= “SQLPLUS’)

Enable Capture
exec  dbms_privilege_capture.enable_capture('CHECK_PRIVILEGES')

Log in as user SALMAN and utilize a couple of privileges granted to it.

SQL> conn salman/salman@pdb1
Connected.
SQL> create table test (id number);

Table created.

SQL> insert into test values (1);

1 row created.

SQL> commit;

Commit complete.

Disable the capture
SQL> exec dbms_privilege_capture.disable_capture('CHECK_PRIVILEGES')

PL/SQL procedure successfully completed.

Generate Results
SQL> exec dbms_privilege_capture.generate_result('CHECK_PRIVILEGES')

PL/SQL procedure successfully completed.

SQL> desc dba_used_sysprivs
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CAPTURE                                   NOT NULL VARCHAR2(128)
 SEQUENCE                                  NOT NULL NUMBER
 OS_USER                                            VARCHAR2(128)
 USERHOST                                           VARCHAR2(128)
 MODULE                                             VARCHAR2(64)
 USERNAME                                  NOT NULL VARCHAR2(128)
 USED_ROLE                                          VARCHAR2(128)
 SYS_PRIV                                           VARCHAR2(40)
 ADMIN_OPTION                                       NUMBER

SQL>  select module,used_role,sys_priv from dba_used_privs where capture='CHECK_PRIVILEGES' and username='SALMAN';

MODULE                                   USED_ROLE  SYS_PRIV
---------------------------------------- ---------- ----------------------------------------
sqlplus@salman1.salman.com (TNS V1-V3)   CONNECT    CREATE SESSION
sqlplus@salman1.salman.com (TNS V1-V3)   PUBLIC
sqlplus@salman1.salman.com (TNS V1-V3)   PUBLIC
sqlplus@salman1.salman.com (TNS V1-V3)   PUBLIC
sqlplus@salman1.salman.com (TNS V1-V3)   PUBLIC
SQL*Plus                                 PUBLIC
SQL*Plus                                 RESOURCE   CREATE TABLE
SQL*Plus                                 RESOURCE   CREATE TABLE

SQL> desc dba_unused_sysprivs
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CAPTURE                                   NOT NULL VARCHAR2(128)
 USERNAME                                           VARCHAR2(128)
 ROLENAME                                           VARCHAR2(128)
 SYS_PRIV                                           VARCHAR2(40)
 ADMIN_OPTION                                       NUMBER


SQL>  select sys_priv from dba_unused_sysprivs where capture='CHECK_PRIVILEGES' and username='SALMAN';

SYS_PRIV
----------------------------------------
SET CONTAINER
CREATE PROCEDURE
CREATE CLUSTER
CREATE INDEXTYPE
CREATE OPERATOR
CREATE TRIGGER
CREATE SEQUENCE
SELECT ANY TABLE
CREATE TYPE


Drop Capture
exec dbms_privilege_capture.drop_capture('CHECK_PRIVILEGES') – Name is case sensitive

So after all this exercise, we can easily identify which privileges are not in use. DBAs can enable the capture for a longer period of time (may be for a few weeks) to avoid any chance of mistake as there could be a possibility that after revoke an “unnecessary” privilege, suddenly a bi-weekly report stops working because privilege(s) needed by this report was listed under “dba_unused_sysprivs “ as this privilege was not in use during capture was enabled

Tuesday, August 19, 2014

WARNING: inbound connection timed out (ORA-3136)

Warning message inbound connection timed out (ORA-3136) can be seen in the alert log file if a client was failed to authenticate itself within a specified period of time, since connection was initiated first time. SQLNET.INBOUND_CONNECT_TIMEOUT parameter is used to specify a value in number of seconds which database would wait for connection to be established, before logging this warning message in alert log file.

This parameter has no default value set in 9i, but has a default value of 60 seconds in later versions. This parameter serves for the security, and alerts (by issuing warning), if someone tries to connect with the database and initial connection effort was turned down (because of wrong password) and session has not tried to reconnect during this specified period of time. With the warning message in alert log file, the IP address of the client is also logged in SQLNET.LOG file.

In the following, you can see how this WARNING message can be deliberately produced.

In my 10.2.0.4 database, I have not set any value for SQLNET.INBOUND_CONNECT_TIMEOUT in SQLNET.ORA and hence it has default value of 60 seconds. In the following image, upper screen shows my initial attempt to connect with the database with wrong password, and in lower screen, exactly after 60 seconds of initial session attempt, this warning message is logged in alert log file.

If you don't want to see this kind of warnings in alert log file, you can set the value of this parameter to 0, which would mean that database would indefinitely wait for the session to be authenticated without logging any error message

Wednesday, August 13, 2014

ORA-01113 when Tablespace is in Backup Mode

On a 9.2.0.4 database, while logging in to the database, it was throwing error "ORA-01033: ORACLE initialization or shutdown in progress". This error means that database is either no-mount or mount state. So, I logged in as SYSDBA and tried to open the database.

/////////////////////////////////////////////////////////////////////////////////////////
SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PRIMARY

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 10 needs media recovery
ORA-01110: data file 10: 'D:\ORACLE\ORADATA\PRODDB\EX_DATA01.DBF'
/////////////////////////////////////////////////////////////////////////////////////////

Apparently it was showing that datafile might have been corrupted and needs media recovery. But when I checked the alert log file, it was showing a message as follows which confirmed that datafile is actually not corrupted, but, tablespace is BACKUP MODE.

/////////////////////////////////////////////////////////////////////////////////////////
ALTER TABLESPACE "PROD_DATA" END BACKUP
Sun Aug 10 03:24:04 2014
Completed: ALTER TABLESPACE "UAT_DATA" END BACKUP
Sun Aug 10 03:24:04 2014
ALTER TABLESPACE "EX_DATA" BEGIN BACKUP
Cc
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.4.0.
System parameters with non-default values:
  processes                = 512
  timed_statistics         = TRUE
/////////////////////////////////////////////////////////////////////////////////////////

Here I realized that that there was an instance crash while tablespace EX_DATA was in backup mode. Querying v$backup returned me following information

/////////////////////////////////////////////////////////////////////////////////////////
SQL> select * from v$backup where file#=10;

     FILE#   STATUS                CHANGE#   TIME
         10     ACTIVE                6016417       10-AUG-14

/////////////////////////////////////////////////////////////////////////////////////////

Solution
Taking tablespace EX_DATA out of back up mode resolved the issue.

/////////////////////////////////////////////////////////////////////////////////////////
SQL> alter tablespace ex_data end backup;

Tablespace altered.

SQL> alter database open;

Database altered.
/////////////////////////////////////////////////////////////////////////////////////////

I tried to reproduce same problem on 11.2.0.1 and I saw that in 11g, the error message for same type of scenario clearly gives hint of checking the datafile for being in BACKUP MODE.

ORA-10873: file 4 needs to be either taken out of backup mode or media
recovered
ORA-01110: data file 4: 'D:\APP\SQURESHI\ORADATA\SALMAN112\USERS01.DBF'

I am not sure if 10g returns the same error message, but, at least starting from 11g there is a clear indication of possibility of the datafile being under backup mode rather than simply giving impression that datafile is corrupted.