SUMMARY, part 2, Semaphore Error

From: Sal Serafino <serafino_at_cshl.edu>
Date: Wed May 11 2005 - 11:31:59 EDT
A little long winded, but this is the end-all summary.

I got a few more replies after my original summary post.  Thanks to all, but 
especially to Bruce, who took the time to write out a few emails and even sent 
me his /etc/system file to help me out.

Bottom line -- my semaphores are fine!

The real problem was that, under Oracle-8, I had manually configure rollback 
segments that were carried forward into the upgraded database on Oracle-9.  I 
tried many times to tweak the memory settings, pool sizes, redo retension, etc. 
with absolutely no success.  No matter what I did - even setting the semaphores 
to astronomical values - it didn't work.  A truss of the oracle process always 
showed the same error as below.  My database had 24 rollback segments online 
that are set with (initial 512K next 512K maxextents 4096).  These were placed 
offline for the tablespace drop and a new segment was online'd that was set up 
specifically for this action as (initital 100M next 50M maxextents 8192) which 
would give me a potential 400+ GB rollback to flush a 6GB database.  The total 
space on the rollback tablespace never went over 2GB, even though THAT was even 
set to be able to hit 8GB - and I have the disk space for it - and even with the 
retention set to 30 seconds this job would die.

I set all my /etc/system parameters back to what they should be based on the 
application vendors' specifications.  Basically, it looks like what it did 
before I had this problem.  Then, I set up a new out-of-the-box instance of 
Oracle using the OLTP template, and did minor tweaking to the SGA sizing to give 
myself larger sort areas and such.  Runs fine now, with Oracle doing all the 
rollback controls.  The retention is still set to 30 seconds, and the UNDOTBS 
never goes over 1GB.  I don't know what magic there is, but I think that some of 
the legacy settings from previous versions (this started at one of the Oracle-7 
versions) got carried forward and finally had a chance to blow up on this 
version.  It actually had nothing at all to do with the semaphore settings.

I apologize for my very late final summary, but since this started working a two 
weeks ago, I've had a lot of catch up work to do.

Again, many thanks to those who took the time to reply -- I greatly appreciate 
your help.

-Sal





Original Post:
Date: Thu, 14 Apr 2005 15:21:56 -0400 (EDT)
Subject: Semaphore Error

Hi All-

I have a situation where I am transporting an Oracle (9.0.1) tablespace between 
two identically outfitted and configured 450's running Solaris 8.  My script 
sets up the transaction to use a huge rollback segment in order to offline drop 
the datafile and then to drop the tablespace including contents.  Oracle will 
drop all objects from the database, but then hangs at the end.  A trace of the 
processes spawned shows:

5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) Err#11 
EAGAIN
5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) 
(sleeping...)
5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) Err#11 
EAGAIN
5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) 
(sleeping...)
5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) Err#11 
EAGAIN
5944:   semtimedop(2293762, 0xFFFFFFFF7FFF8FEC, 1, 0xFFFFFFFF7FFF8FD8) 
(sleeping...)

I had to kill it to make it stop.  I actually did get a "Tablespace Dropped." 
message on the Oracle side, and there are no objects owned by this user and no 
references to the tablespace anywhere.  BUT... the tablespace still exists in 
dba_tablespaces and the datafile still exists in dba_data_files, so any attempt 
to import the transported tablespace afterwards dies. 

According to semop(2), The semtimedop() function will fail if:

     EAGAIN
           The timeout expired  before  the  requested  operation
           could be completed.

     The semtimedop() function will fail if one of the  following
     is detected:

     EFAULT
           The timeout argument points to an illegal address.

     EINVAL
           The timeout argument specified  a  tv_sec  or  tv_nsec
           value  less than 0, or a tv_nsec value greater than or
           equal to 1000 million.

There are errors other than EAGAIN.  My /etc/system file looks good:

set msgsys:msginfo_msgmax=8192
set msgsys:msginfo_msgmnb=16384
set msgsys:msginfo_msgmni=1700
set msgsys:msginfo_msgtql=512
set semsys:seminfo_semmns=2048
set semsys:seminfo_semmnu=2048
set semsys:seminfo_semmsl=2048
set semsys:seminfo_semmni=100
set semsys:seminfo_semume=256
set shmsys:shminfo_shmmax=2147483647
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set maxusers=256
set nproc=4096
set pt_cnt=256
set rlim_fd_cur=2048
set rlim_fd_max=8192


Ideas?  Greatly appreciated,
-Sal
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed May 11 11:32:27 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:46 EST