Tuesday, August 19, 2014

WARNING: inbound connection timed out (ORA-3136)

Warning message inbound connection timed out (ORA-3136) can be seen in the alert log file if a client was failed to authenticate itself within a specified period of time, since connection was initiated first time. sqlne.ora parameter called "SQLNET.INBOUND_CONNECT_TIMEOUT" is used to specify a value in number of seconds, database would wait for connection to be established before logging this warning message in alert log file.

This parameter has no default value in 9i, but has a default value of 60 seconds in later versions. This parameter serves for the security and alerts (by issuing warning) if someone tries to connect with the database and initial connection effort was turned down (because of wrong password) and session has not tried to reconnect during this specified period of time. With the warning message in alert log file, the IP address of the client is also logged in SQLNET.LOG file.

In the following you can see how this WARNING message can be deliberately produced.

In my 10.2.0.4 database, I have not set any value for SQLNET.INBOUND_CONNECT_TIMEOUT in SQLNET.ORA and hence it has default value of 60 seconds. In the following image, upper screen shows my initial attempt to connect with the database with wrong password, and in lower screen; exactly after 60 seconds of initial session attempt, this warning message is logged in alert log file.

If you don't want to see this kind of warnings in alert log file, you can set the value of this parameter to 0 which would mean that database would indefinitely wait for the session to be authenticated without logging any error message

Wednesday, August 13, 2014

ORA-01113 when Tablespace is in Backup Mode

On a 9.2.0.4 database, while logging in to the database, it was throwing error "ORA-01033: ORACLE initialization or shutdown in progress". This error means that database is either no-mount or mount state. So I logged in as SYSDBA and tried to open the database.

/////////////////////////////////////////////////////////////////////////////////////////
SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PRIMARY

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 10 needs media recovery
ORA-01110: data file 10: 'D:\ORACLE\ORADATA\PRODDB\EX_DATA01.DBF'
/////////////////////////////////////////////////////////////////////////////////////////

Apparently it was showing that datafile might have been corrupted and needs media recovery. But when I checked the alert log file, it was showing a message as follows which confirmed that datafile is actually not corrupted, but tablespace is BACKUP MODE.

/////////////////////////////////////////////////////////////////////////////////////////
ALTER TABLESPACE "PROD_DATA" END BACKUP
Sun Aug 10 03:24:04 2014
Completed: ALTER TABLESPACE "UAT_DATA" END BACKUP
Sun Aug 10 03:24:04 2014
ALTER TABLESPACE "EX_DATA" BEGIN BACKUP
Cc
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.4.0.
System parameters with non-default values:
  processes                = 512
  timed_statistics         = TRUE
/////////////////////////////////////////////////////////////////////////////////////////

Here I realized that that there was an instance crash while tablespace EX_DATA was in backup mode. Querying v$backup returned me following information

/////////////////////////////////////////////////////////////////////////////////////////
SQL> select * from v$backup where file#=10;

     FILE#   STATUS                CHANGE#   TIME
         10     ACTIVE                6016417       10-AUG-14

/////////////////////////////////////////////////////////////////////////////////////////

Solution
Taking tablespace EX_DATA out of back up mode resolved the issue.

/////////////////////////////////////////////////////////////////////////////////////////
SQL> alter tablespace ex_data end backup;

Tablespace altered.

SQL> alter database open;

Database altered.
/////////////////////////////////////////////////////////////////////////////////////////

I tried to reproduce same problem on 11.2.0.1 and I saw that in 11g, the error message for same type of scenario clearly gives hint of checking the datafile for being in BACKUP MODE.

ORA-10873: file 4 needs to be either taken out of backup mode or media
recovered
ORA-01110: data file 4: 'D:\APP\SQURESHI\ORADATA\SALMAN112\USERS01.DBF'

I am not sure if 10g returns the same error message, but at least starting from 11g there is a clear indication of possibility of the datafile being under backup mode rather than simply giving impression that datafile is corrupted.

Thursday, August 7, 2014

Excessive PGA Memory Usage by Sessions and System Hang

One day in 2006, users (developers) started complaining that development database (9i database on RHEL 4)  has become very slow. I was not even able to log into the server to check the situation as it was hung, and only solution was to hard-restart the server.
It happened again, and again, and again. It was a challenge to find out what is going wrong and where. After some "googling", I was able to guess that something is happening with RAM/SawpSpace usage which was causing system to hang and eventually needed a hard-reset. Since this issue was happening sporadically, it was not easy to catch the culprit.

I decided to configure a job which would run every 10 seconds and capture value of PGA_ALLOC_MEM column in v$process dynamic performance view along with other session details from v$session, and insert into a table with timestamp of every capture of this data. So, when it happened next time, I was able to find a session which had allocated PGA memory 4G when last time system crashed. Further digging and tracing to the session from this module (module from where this session was connected) it was found that there was a bug in a module which was causing a loop to run millions of times and opening the cursors and hence eating up all the memory and swap space.

Time by time, I have used same methods in different scenarios which made me able to find out the issues with the applications and also some Oracle related bugs (if some oracle internal process PGA memory keeps on increasing and eventually depleting all memory because of memory leak).

Query
Following query can be used to check the processes which are allocating more PGA memory

#######################################################
set lines 500
set pages 1000
col program format a20
col module format a20
col machine format a40

select a.spid OS_PROCESS_ID,b.sid,b.serial#,b.username,b.schemaname,b.osuser,b.machine,b.status,b.server,
b.program,b.module,round(a.PGA_ALLOC_MEM/1024/1024,2) "Currecn PGA Allocated",round(a.PGA_USED_MEM/1024/1024,2)"Currect PGA memory in use"
from v$process a,v$session b where  a.addr=b.PADDR
order by a.PGA_ALLOC_MEM;

#######################################################

Scheduled Job
Following steps can be used to set up and job which will connect PGA memory information on regular basis (10 minutes in this case) and put in a table "PGA_MEMORY_USAGE".

#######################################################
create table PGA_MEMORY_USAGE as
select logon_time,b.sid,round(a.pga_alloc_mem/1024/1024,2) PGA_ALLOCATED_MEMORY,b.username,b.machine,b.module from v$process a, v$session b where a.addr=b.paddr and a.pga_alloc_mem > 20000000 order by a.pga_alloc_mem;

alter table PGA_MEMORY_USAGE add(snapshot_time date);

variable j number

exec dbms_job.submit (:j,what=>'begin insert into PGA_MEMORY_USAGE select logon_time,b.sid,round(a.pga_alloc_mem/1024/1024,2),b.username,b.machine,b.module,sysdate from v$process a, v$session b where a.addr=b.paddr order by a.pga_alloc_mem; commit; end;',next_date=>sysdate,interval=>'sysdate+1/144')
commit;

SQL> print :j

         J
----------
        21

#######################################################

Now following query can be used to check PGA memory by session and when memory of some particular session started growing and how much it grew.

alter session set nls_date_format='DD-MON-YY HH24:MI:SS';

select * from PGA_MEMORY_USAGE order by PGA_ALLOCATED_MEMORY;


Remove Job
Once you have found the culprit, following commands could be used to clean up the system.

#######################################################
exec dbms_job.remove(21);
commit;

drop table  PGA_MEMORY_USAGE;

#######################################################

MOS note 822527.1 further explains about how to check, which part of PGA memory is growing so that issue could be resolved easily.