SUMMARY Disk failure?

From: Richard Butler <rbutler_at_ibc.cnr.it>
Date: Mon Jun 18 2007 - 13:06:22 EDT
Hi,

I asked:
> Yesterday I had a disk fail on a SunFire 280R Solaris 8.
> The questions first:
> Can the experts confirm my opinion that this is a hardware problem - 
> disk failure?
Thanks to all those who gave help. (Roger Kynaston, Grant Lowe, Michael 
Grice, Brad Morrison and Abhijit Das). The consensus was that this was 
indeed a hardware disk failure  - not the controller as this would have 
affected the other drive. This was confirmed using iostat which showed 
multiple hard errors and also by smartd which was unable to register the 
disk: Device: /dev/rdsk/c1t1d0s0, failed Test Unit Ready [err=-5], but 
did register the good one. smartd is part of the smartmontools package 
from SourceForge and, if I had had it installed, it might have given me 
advance warning of the failure.

I also asked:
> Have you any suggestions for recovering data from this drive (I do 
> have backups, but I would still lose some important data)?
Suggestions were that I might be able to revive it temporarily by 
slapping it and/or putting it in the fridge for a couple of hours. Short 
of this forget it or the expensive data retrieval services. I can 
confirm that both methods have worked for me with PC disks in the past. 
Warning - don't try the fridge trick in a humid atmosphere or 
condensation can cause worse problems (if possible!).

I tried both methods, but no luck. I have ordered a new drive and will 
recover what I can from backups. In addition I will probably install 
smartd on this and other servers.

Thanks again
Richard Butler

Details of the original question:
>
> Symptoms:
> This machine had two 72G disks (not mirrored) and during reboot after 
> installing the latest recommended patches I get the warning:
> ...
> Jun 14 12:06:30 ed pcisch: [ID 370704 kern.info] PCI-device: 
> SUNW,qlc@4, qlc0
> Jun 14 12:06:30 ed genunix: [ID 936769 kern.info] qlc0 is 
> /pci@8,600000/SUNW,qlc@4
> Jun 14 12:06:30 ed genunix: [ID 936769 kern.info] fp0 is 
> /pci@8,600000/SUNW,qlc@4/fp@0,0
> Jun 14 12:06:31 ed genunix: [ID 405830 kern.warning] WARNING: Device 
> ssd0 failed to power up.
> Jun 14 12:06:32 ed genunix: [ID 749148 kern.warning] WARNING: Please 
> see your system administrator or reboot.
> Jun 14 12:06:32 ed scsi: [ID 799468 kern.info] ssd0 at fp0: name 
> w21000004cf8e7591,0, bus address e8
> Jun 14 12:06:32 ed genunix: [ID 936769 kern.info] ssd0 is 
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0
> Jun 14 12:06:32 ed scsi: [ID 365881 kern.info]  Vendor 'SEAGATE', 
> product 'ST373405FSUN72G', (unknown capacity)
> Jun 14 12:06:32 ed genunix: [ID 408114 kern.info] 
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0 (ssd0) online
> Jun 14 12:06:32 ed scsi: [ID 799468 kern.info] ssd1 at fp0: name 
> w21000004cf8e7555,0, bus address ef
> Jun 14 12:06:32 ed genunix: [ID 936769 kern.info] ssd1 is 
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0
> Jun 14 12:06:32 ed scsi: [ID 365881 kern.info]  <SUN72G  cyl 14087 alt 
> 2 hd 24 sec 424>
> Jun 14 12:06:32 ed genunix: [ID 408114 kern.info] 
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0 (ssd1) online
> Jun 14 12:06:32 ed swapgeneric: [ID 308332 kern.info] root on 
> /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cf8e7555,0:a fstype ufs
> ...
> And of course all the filesystems on this disk failed to  fsck  or  
> mount.
>
> Using format I can see the bad disk as c1t1d0  (although  searching 
> for disks...  seems to take longer than normal)
>     0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
>           /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0
>     1. c1t1d0 <drive type unknown>
>           /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0
>
> If I go ahead and give the correct geometry I can then see the 
> partition table as I had it before the crash.
> Part      Tag    Flag     Cylinders         Size            Blocks
>   0       root    wm       0 -   412        2.00GB    (413/0/0)     
> 4202688
>   1       swap    wu     413 -  1237        4.00GB    (825/0/0)     
> 8395200
>   2     backup    wm       0 - 14086       68.35GB    (14087/0/0) 
> 143349312
>   3 unassigned    wm    1238 -  1240       14.91MB    (3/0/0)         
> 30528
>   4        var    wm    1241 -  2065        4.00GB    (825/0/0)     
> 8395200
>   5 unassigned    wm    2066 -  6187       20.00GB    (4122/0/0)   
> 41945472
>   6        usr    wm    6188 -  8248       10.00GB    (2061/0/0)   
> 20972736
>   7       home    wm    8249 - 14086       28.33GB    (5838/0/0)   
> 59407488
>
> I cannot however mount any partition:
> mount /dev/dsk/c1t1d0s5 /mnt
> mount: I/O error
> mount: cannot mount /dev/dsk/c1t1d0s5
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jun 18 13:06:18 2007

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:06 EST