SUMMARY: System with RAID array fails to reboot after power failure

From: Peter Schauss x 2014 (ps4330@okc01.rb.jccbi.gov)
Date: Thu Aug 22 1996 - 10:23:46 CDT


I am going to summarize while the number of responses is still
manageable. Thanks to everyone who responded an appologies to
those who responded after I composed this note.

Original question:

I am running a SPARC 20 (Solaris 2.4) with a Sun SPARC Storage Array.
The disks on the array are configured as RAID 5 volumes managed
by the Veritas volume manager (supplied with the array by Sun).

The system gives me no problems except when we have a power failure
which outlasts the UPS. In this situation, the disk array apparently
takes longer to complete its initialization process than the main
CPU. Thus when the CPU tries to mount the RAID volumes it gets
a large number of errors and the boot process fails. Since I administer
this system from a remote site, I would like everything to be as
automatic as possible (When this happens I cannot access the SPARC 20
via the network.). Does anyone have a solution to this problem?

___________________________________________________________________________
Summary:

Most likely, I will try the approach suggested by
mshon@sunrock.East.Sun.COM (Michael J. Shon) (option 3)

He sent me two files

etc_nvramrc.fth to be placed in /etc/nvram.fth
This is a forth program which inserts an 80 second delay at startup,
breakable by hitting any key

init.d_nvramrc to be placed in /etc/init.d/nvramrc

This shell script adds the above program to nvram at bootup. Put a pointer
to this file in one of the /etc/rc directories. As I understand it,
this script just reenters nvram.fth in case the CPU is swapped, so it
does not matter too much when it is run.
___________________________________________________________________________

Detailed Responses:

From: amy.hollander@amp.com (Amy Hollander)

In the eeprom, put an 80 second delay. that gives it time for the ssa to come up
___________________________________________________________________________

>From MHILL@graver.com Thu Aug 22 10:22 EDT 1996
From: "Matt Hill" <MHILL@graver.com>
Date: Thu, 22 Aug 1996 10:24:43 EST
Subject: Re: System with RAID array fails to reboot after power failure

put a sleep command somewhere near the beginning of /etc/rc2.d?

___________________________________________________________________________

>From erich@decux.nvg.com Thu Aug 22 10:34 EDT 1996
From: erich@s1000e.nvg.com (Erich Breu)
Subject: Re: System with RAID array fails to reboot after power failure

Our two different UPSs notify our sparcserver once the battery power
is running out and the system performs a clean shutdown.
Liebert is our main UPS and they supply software that runs on the
SUN and communicates to the UPS via a serial line.

Erich
erich@decux.nvg.com

___________________________________________________________________________

>From mshon@sunrock.East.Sun.COM Thu Aug 22 10:45 EDT 1996
From: mshon@sunrock.East.Sun.COM (Michael J. Shon {*Prof Services} Sun Rochester)
Subject: Re: System with RAID array fails to reboot after power failure

The attachments implement Solution 3.

Subject: nvramrc for disk delay Internal SRDBs document 11298

http://sunservice.Corp.Sun.COM:80/cgi-bin/sunsolvei/doc2html?intsrdb/11298

                        Internal SRDBs document 11298

----------------------------------------------------------------------------

SRDB ID: 11298

SYNOPSIS: SSA requires delay to boot

DETAIL DESCRIPTION:

If a host machine boots faster than the SPARCstorage Array that is
attached can become ready, the host machine might fail to initialize
correctly.

The SSA has an internal diagnostic sequence that takes a minimum of 75
seconds. The host machine can often boot from a cold power on faster
than that. When it starts its routine of looking for devices
the SSA has not yet become ready.

SOLUTION SUMMARY:

SOLUTION 1
----------
The following forth code will delay the host for 80 seconds,
and print the seconds until boot, to allow the SSA to beat
the OS.

>ok nvedit
probe-all install-console banner
: wait_for_ssa
." Waiting 80 seconds for SSA" cr
d# 80 0 do
i .d (cr
d# 1000 ms
loop
;
wait_for_ssa

then ctrl c to exit
>ok nvstore
>ok setenv use-nvramrc? true

Note that this loop is not breakable, and will pause for 80 seconds after
every reset.

SOLUTION 2
----------

        1. Place the following in a file, /etc/nvramrc.fth
                This version lets you break out simply
                by pressing any single key.

                        -----------------------

probe-all \ install devices
install-console \ install console device
banner \ output banner

: abort-on-key ( -- ) \ Define the function, ( -- ) is a comment
                                \ that means 'no stack change'

  key? \ key pressed?
  abort" Booting continuing. Waiting timer abourted." \ abort with
message if true
; \ finish function definition

: timed-startup ( -- ) \ Define the function, ( -- ) is a
                                \ comment that means 'call this with one
                                \ parameter on the stack, nothing left
                                \ on the stack on return'

 ." Waiting for the SPARCstorage Array disks to spin-up."
 cr
 ." Press any key to abort timer and continue to boot." cr
 0 B4 do i \ do loop (count down 180 seconds)
   ." Booting will continue in " \ construct the count down string
   .d \ print the value in decimal
   i 1 = if ." second." \ with the correct plural!
   else ." seconds..." \ I am a fussy programmer
   then (cr \ let us get the plurals correct
   d# 1000 ms \ spin for 1000 milliseconds
   abort-on-key \ check for a key
 -1 +loop \ subtract 1 from the i index then loop
 cr
 ." Booting continuing. Waiting time elapsed." cr \ all done
;

timed-startup \call timed-startup

                        -----------------------

        2. Do the following. You may want to place this in one of the
                startup scripts, so that if a CPU is swapped it is automatically
                re-installed.

                eeprom fcode-debug?=true
                eeprom use-nvramrc?=true
                eeprom nvramrc="`cat /etc/nvramrc.fth`"

        3. Reboot the machine.

SOLUTION 3
----------

There have been reports of SOLUTION 2 not running on some platforms,
especially 1000E and 2000E machines.

The following code has been tested on 1000E and 2000E machines. Please use
this code in place of the above.

        1. Place the following in a file, /etc/nvramrc.fth

probe-all \ install devices
install-console \ install console device
banner \ output banner

  .( Timer implemented to allow SSA's to boot from cold start)
  cr

: abort-on-key ( -- ) \ Define the function, ( -- ) is a comment
                                \ that means 'no stack change'
  key? \ key pressed?
  abort" Start delay aborted" \ abort with message if true
; \ finish function definition

: timed-startup ( -- ) \ Define the function, ( time -- ) is a
                                \ comment that means 'call this with one
                                \ parameter on the stack, nothing left
                                \ on the stack on return'

  .( Press any key to abort timer) cr

d# 80 0 do \ set up loop parameters
i .d (cr \ print the value
d# 1000 ms \ wait a second
abort-on-key \ Key pressed?
loop \ do it again

  ." Timer complete" cr \ all done
; \ finish function definition

timed-startup \ call the routine

        2. Do the following. You may want to place this in one of the
                startup scripts, so that if a CPU is swapped it is automatically
                re-installed.

                eeprom fcode-debug?=true
                eeprom use-nvramrc?=true
                eeprom nvramrc="`cat /etc/nvramrc.fth`"

        3. Reboot the machine.

----------------------------------------------------------------------------

If you have access to sunsolve, see srdb/11298 for some forth
code you enter to delay the boot process while waiting for the
SSA to get ready.

Here's one example:

SOLUTION 1
----------
The following forth code will delay the host for 80 seconds,
and print the seconds until boot, to allow the SSA to beat
the OS.

>ok nvedit
probe-all install-console banner
: wait_for_ssa
." Waiting 80 seconds for SSA" cr
d# 80 0 do
i .d (cr
d# 1000 ms
loop
;
wait_for_ssa

then ctrl c to exit
>ok nvstore
>ok setenv use-nvramrc? true

Note that this loop is not breakable, and will pause for 80
seconds after every reset.

----------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:08 CDT