SUMMARY: Solaris 10 zone - user processes crashing randomly

From: Pascal Grostabussiat <pascal_at_azoria.com>
Date: Mon Oct 08 2007 - 11:32:08 EDT
Hi,

Many thanks to those who replied and tried to help.

We have been investigating that problem for a while and so far no real 
explanation has been found. From our own analysis we found a potential 
issue/bug with the libclntsh.so library from Oracle (10.2.0.2 and 
10.2.0.3 for the version we have been using) that our software uses, 
something with the sslsshandler in that lib. When talking to Oracle the 
only feedback we got was regarding a bug in Oracle 8 (check for bug 
2012268 on Oracle's side for more details). However, as mentioned 
earlier, we are running Oracle 10. We went back to Oracle 9 and the 
issue disappeared. When going back to Oracle 10, the issue came back. So 
we have now implemented the work-around suggested by Oracle (for Oracle 
8) and adapted it for Oracle 10 and got a much better stability ... !?

For the record:

Before starting the process which performs dlopen/dlclose of a module linked
with Oracle set the environment variable LD_PRELOAD to point to the 
libclntsh.
so file that is being used. For example:

setenv LD_PRELOAD $ORACLE_HOME/rdbms/lib/libclntsh.so.8.0

This maps libclntsh permanently and avoids the core dump.  This variable 
must
only be set for programs that encounter the core dump.

__ <file:///metalink/plsql/showdoc?db=Bug&id=2012268> 
<file:///metalink/plsql/showdoc?db=Bug&id=2012268>Regards,
/Pascal

Pascal Grostabussiat wrote:
> Hi guys,
>
> I am puzzled by that issue and I have never seen such things happening 
> before. I hope you can point me to some new directions or any 
> information sources on the net that might be relevant.
>
> I am in a Solaris 10 environment. Our applications have been installed 
> in a dedicated zone. The applications are nothing new, we have been 
> running them in many different kind of environments including similar 
> environments (Solaris 10 zone) and no such issue has been seen before.
>
> User processes have been running for a month or two, and one day some of 
> them started crashing for no reason. After a few repeated crashes they 
> were stable again. Then a few hours later sometimes the day after other 
> or similar user processes crashed again. This has been going on for 
> about two/three weeks now. User processes are both C/C++ processes and 
> Java processes, and user processes crashing are or both kinds. Sometimes 
> on specific user process crashes, sometimes 2, 3 or 4 at the same time, 
> not simultaneously but coming up and down within the same chaotic period 
> of time (from 1 hour to 2-3 hours), before things get stable again for 
> several hours.
>
> We have inspected the logs of our applications and of course the 
> core-files but could not get any clue !? According to some core-files it 
> looks like some processes sometimes get a SIG ABORT signal (regular kill 
> (SIGTERM) signal are handled by the applications as normal shutdown), 
> while others seemed like being waiting in their normal course of action 
> just before they crashed (still according to some core-files). Our 
> developpers checked the core-files in detail but could not get any clue.
>
> I have checked the resource limitations on the platform and they are not 
> different from other environment where applications are stable. We have 
> been investigating core-files using pflags but could not get more clues 
> on that side. Remote DB and network have been investigating to but 
> nothing has been found there neither. I have asked people in the project 
> to report activities they were performing at crash-time but could not 
> get any pattern. I have discussed with local sysadmins to track any kind 
> of external activities (with respect to our zone) that might be 
> triggered now and then, but nothing.
>
> So my question is: is there someone that experienced such REALLY weird 
> events in their own environment ?
>
> Feel free to send ANY idea, or point to any tools or commands (cannot 
> really be root) that might help, because I am stuck and getting short of 
> ideas !? I have been working with Sun environments since SunOS 4, from 
> Sparc Classic ;-) to SF15K, and I have never seen this before !?!?
>
> MANY thanks in advance!
> Regards,
> /Pascal
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Oct 8 11:28:09 2007

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:07 EST