SUMMARY: ultra 60/sol8: unexpected halt - no log/message

From: brsys <brsys_at_earthlink.net>
Date: Sun Oct 05 2003 - 01:37:59 EDT
Hi All,

first, thank you to all of you.

My problem was (summary):
This machine boots/runs fine for 1 hour to 1..2 days (we runs some apps, 
compilers, editors, X),
then suddently stops/halts without any messages in the logs or in the 
console, nothing.
(see complete description at the end).

The root cause seemed to be the power supply. I changed it 10 days ago 
and the
machine didn't crash since then.
(I had removed the power management packages, the machine was uncovered
so it got plenty of air, i removed/reinserted the memory and cpu modules 
several times,
and my cpus are 300Mhz U2 before replacing the power supply)

See the different answers i got below.

Again, Thank you to all you. This is a great list.
Bob


>From Stephen:
What CPU modules do you have in there? Some later (400 and 450 MHz
UltraSparc-II) modules had problems that match quite well with what you're
describing (Ecache, parity tag problems). Sometimes it shows up in the
AFSR/AFAR registers, sometimes not. Updating to a newer OBP *might* help in
getting better diagnostics.


 From Pete:
I've seen two reasons for this happening:
1> the processor isn't seated
2> (most common) over-heating. Try pulling the cover off and let it run.
  If you've got some canned air, hit the area around the CPU and power 
supply

>From Jeff:
My best bet is you installed Solaris 8 with the power manager enabled by
default.  I had this same problem on an Ultra 60 I installed Sol 8 onto, and
it took me a while to figure out that the power manager was enabled and was
powering down the system after some period of non use.


>From Stephen:
You have a console/terminal server to log the console messages?  Just
curious, since a directly attached monitor wouldn't help if you have
power-offs.  
I would suspect power management here, but on the other hand I would also
expect it to report some action it's taking via console at least, if not
syslog (/var/adm/messages) as well.  I believe the config file is
/etc/power.conf if you want to check it.  I'm no longer sure, since we long
ago purged all power management packages (along with most of the other 700+
fluff packages in the full Sun dist) from our servers, since they did
nothing good for us and potentially something bad.
Does "prtdiag -v" show anything interesting?


>From Bruce:
This sounds like the problem we were experiencing on one of our Ultra 10's.
The server just stopped working without warning or anything in
/var/adm/messages.  It turned out to be a bad power supply.



brsys wrote:

>
> Hi All,
>
> We got a new used ultra 60 on which we installed solaris 8 + last pack 
> of patches from sun.
>
> This machine boots/runs fine for 1 hour to 1..2 days (we runs some 
> apps, compilers, editors, X),
> then suddently stops/halts without any messages in the logs or in the 
> console, nothing.
> It is not a "normal" shutdown or halt, since it looks like the power 
> is switched off !
>
> It's lighly loaded (1 user) connected through ssh. The stops occurs 
> indifferently when the
> user is working or idle. I'm currently trying to reproduce when the 
> user works directly
> on the machine screen/keyboard.
>
> I made some diagnostic tests but there was no problems.
>
> Any advice on where i should start to solve this problem ? Or any url 
> that can describe
> some kind of process for finding the problem ?
>
>
> Thanks a lot!!!
>
> Bob
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Sun Oct 5 01:37:56 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:20 EST