SUMMARY: SS5 keep rebooting

From: Feng Qiu (fqiu@bmb-fs1.biochem.okstate.edu)
Date: Wed May 28 1997 - 11:41:18 CDT


Thanks all the response. I reseat the RAM, check all the devices
connections, and do fsck. Works fine!
Feng Qiu

Original message:

> Hello, all,
> One of my ss5 with solaris2.5.1 is keeping rebooting, error message
> said:
>
> Unrecoverable DMA error on dma
> Panic: asynchronous memory fault:
> MFSR=80804820 MFAR=cdb800
>
> Is anyone know what is wrong? Thanks!!

-------------------------------------------------------
"David M. Davisson" <davisson@emuni.com>

This usually indicates some device on the SCSI bus is not responding
properly, usually indicating a loss of termination on the bus. Hard
to say which device if you have multiple devices. If you only have
an internal hard disk on the bus, then you had better get ready to
replace it.

--------------------------------
No Spam Accepted <nospam@getbent.eng>@ppplin

Well, certainly sounds like a hardware problem. I would
start by running the extended POST (power-on self tests).
To do this, connect dumb terminal to your A serial port (or
if you have a laptop or another workstation nearby you could
use a tip session). Then set the "diag-switch?" NVRAM parameter
to true. Cycle power on the faulty machine and observe the output
on the device connected to serial port A.

If any error is identified at this stage, you'll need to ring
up for some new hardware.

If all POSTs pass, look at the banner. Does it report the amount of
memory that is supposed to be in the system, or is it different?
Wait for the system to test and initialize all of your memory. Again,
if an error is identified, pluck out the offending item and replace it.

If there is still no evidence of error, check all SCSI and ethernet
cabling, as well as any external SCSI devices. It wouldn't hurt
to open the case at an Anti-ESD station and make sure all memory SIMMs
are in place, and all SBus cards are secured.

Without knowing an EXACT configuration of the system and a more
detailed knowledge of the error, I can't help much more right now.

-- 
Apologies for the Anti-SPAM return address.  Mail can
be sent to misoft.com with a userid of my first name.

Cecil Jacobs miSOFT Engineer

------------------------------------- Stephen Harris <sweh@mpn.com>

When I had this problem, it was a dodgy SIMM chip :-(

-------------------------------------- ina@experts-exchange.com (Ina Gardner)

We saw your post in comp.unix.solaris and we thought you might be interested in our new Web site -- http://www.experts-exchange.com. Our site provides technical help and expert consulting on a variety of topics, for free. Give us a try! We promise friendly assistance, and every question, simple or complex, is welcome on our site.

Thanks,

Ina Gardner http://www.experts-exchange.com

P.S. (We keep track of the email addresses we send to in order to avoid repeat emails.)

----------------------------------------- "zenman" <shane_bush@fmi.com>

You got a bad memory chip. Pull them out one by one and restart the machine until you work out which one it is. Or if only one chip, borrow one from another machine to make sure.

----------------------------------- crguev@velu.com (Carlos R. Guevara - Sun Managers Acc.)

Apparently you have a MEMORY SIMM problem...I encountered the same problem with an SS4 which I had upgraded using KINGSTON memory....When I replaced the SIMM the problem subsided.....

DMA means direct memory addressing......It seems the problem is that some BYTES in the SIMM are returning VOGUS values, making the machine lock-up....

------------------------------------ mikec@lib.siu.edu (Mike Connor)

The last time I saw that was when I tried putting non-parity RAM in a SparcClassic. If it just started doing this with no 'help', try changing out the RAM.

Mike

------------------------------ "Cheng, Bruce" <Bruce.Cheng@Aspect.com>

You may want to open the box ad reseat the memory, see if that helps. Check if the memory connector is clean may help.

-------------------------------- "Matthew Stier" <mstier@hotmail.com>

Bad simm. Stop-A the system, and change the selftest-megs# to the number of megabytes in the system. The 'reset' the system and see what simm is bad.

------------------------------ D.White@mcs.surrey.ac.uk

Sounds like the memory needs testing - go to the OK prom prompt, and type:

setenv selftest-#megs 64 (or however much memory the SS5 has) test-memory

--------------------------------- Benjamin Cline <benji@hnt.com>

I'm not 100% sure what the problem is, but it sounds like a hardware problem to me. Is your system still under warranty? Do you have a service contract? If you do, I'd suggest calling Sun and asking them for help.

----------------------------------- Paul Kanz <paul@icx.com>

Have you reseated the memory yet? Looks like you have a problem with a SIMM. Might also try testing the memory from the OpenBoot prompt.

----------------------------------- Jim Harmon <jharmon@telecnnct.com>

My SS5 just did exactly the same thing.

This means your SIMM is bad. (SIMM = Single In-line Memory Module)

Depending on how many you have (I think you can have up to 8, this is probably an indication that the first one is bad.

You can test this (CAREFULLY!) by moving the SIMM in the first slot (closest to the back of the machine) to another slot and trying to reboot. If the error changes at all, that's the bad SIMM. If you have (2) SIMMS, switch them around so tht the first is the second and the second is the first. If your system boots, then the SIMM that was in the first slot was bad.

-----------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:55 CDT