Friday, May 13, 2011

ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS1'


One of our databases crashed recently because the disks were taken offline by mistake – fortunately, it was a development one!

When I brought it back online and people started to use it, one of the developers got the following error message:

ERROR at line 1:
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS1'

I had a quick look and it appeared that there were queries which must have been running at the time of the crash which required recovery. It was not possible to use the UNDO space because it was being used for recovery.

To check if the UNDO is being used you can use the following:

Select (used_ublk*8192)/1024/1024 MB
From v$transaction;

The 8192 is the block size of the database so change it if you database has a different block size.

In Oracle 10g you can look in the v$session_longops view to see how long a transaction has to roll back.

I also noticed that there were lots of log switches in the alert log and relatively high disk queues on the SAN disk which held the data for the database in question.

In udump folder there are SMON files which report:

*** 2009-04-23 16:52:04.966
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319

Also in udump there are P001, P002....P007 files which report:

*** SERVICE NAME:(SYS$BACKGROUND) 2009-04-23 16:56:05.726
*** SESSION ID:(1088.37) 2009-04-23 16:56:05.726
Parallel Transaction recovery server caught exception 10388

Next, it's a nice idea to work out which objects are being recovered.

This query shows the parallel query processes and slaves which are running:

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from
gv$px_session px,
gv$session s
where
px.sid=s.sid (+)
and
px.serial#=s.serial#
order by 5 , 1 desc ;

Take the SID from the above query - there may be multiple ones - and put it into the query below:

select *
from gv$active_session_history
where session_id in (1098);

If you look at the CURRENT_OBJ# column, it will show you the object_id for the object which is being recovered.

It was still not possible to run the insert statement that the developer wanted to run so I investigated further.

If you also look in the gv$active_session_history view, you will see the column EVENT. This was showing as "wait for stopper event to be increased".
I found Metalink note: 464246.1 "Database Hang Due to parallel Transaction Recovery". If you can't see that, here is a summary:

=> Sometimes Parallel Rollback of Large Transaction may become very slow
=> Usually occurs when killing a long running process or aborting the database (this is what happened in my case)
=> In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.
=> When the amount of uncommitted data is above a certain threshold, and the database has to roll back, parallel recovery kicks in and uses SMON as the recovery co-ordinator.
=> Parallel slaves can contend with each other when trying to roll back in parallel, so the process may be faster in serial. To do so, use:

ALTER SYSTEM fast_start_parallel_rollback = false;

If you execute this command while it is trying to recover, it seems that further logons are not permitted. I received the following error when attempting to log on:

ORA-03113: end-of-file on communication channel

However, after some time it seemed to right itself. If you are interested in monitoring your UNDO usage, or monitoring your TEMP space usage, I would recommend you read my other articles on the subject.

1 comment:

  1. Whoever the owner of this site is, you need to remove the completely duplicated content that is from my site. www.ora00600.com.

    Yes, you link back to my site but you have copied it word for word. Please remove it.

    ReplyDelete