SUMMARY: automated backups

From: Daryl Crandall (daryl@oceanus.mitre.org)
Date: Wed Jun 26 1991 - 15:04:08 CDT


Sun-managers,

Here's the summary of the automatic dump techniques that I've received.
Sorry about the delay. System administration never gets easier, just more
extensive. More users, more machines, more software, more networks, and more
mail. I'm sure glad there are 36 hours in a day, otherwise I'd never make
it :-)

        Daryl Crandall
        The Mitre Corporation
        daryl@mitre.org
        (703) 883-7278

#############################################################################
ORIGINAL QUESTION:

The return of summer thunderstorms has reminded us of the need for adequate
level 0 backups and incrementals.

We are going to purchase uninterruptible power supplies (UPS) for our servers
but that will be too late to help us this summer.

I'd like to hear some success stories of completely automated backup techniques
that guarentee proper level 0 dumps.

We have several servers with a total of 15GB of disk space and we have only one
8mm tape drive. I want to be able to schedule a level 0 dump of 2MB at least once each week. A level 7 dump will be performed each friday night,and daily (level 9) backups each evening execept the nights of the level 0 and 7 dumps.
This should give me a complete level 0 cycle of 7 weeks.

I've tried dismounting partitions, but that doesn't work if any user is
"using" the partition. Also, the / and the /usr partitions present a problem since they can't be dismounted when in multi-user mode.

I'd like to see an answer that either forces a partition to be dismounted, or
mounted read-only, or a method that takes the system down to single user mode, does the dump remotely to the tape server and comes back to multi-user automatically.

The goal here is to find a technique that guarentees a good level 0 dump
(and higher level dumps if possible), from any of our servers to the one and
only 8mm tape drive, on a single 8mm tape (excluding tape failure), with minimum system down time, without human intervention, using existing features of SunOS-4.1.1.

Don't want much do I? :-)

After this many years of UNIX experience among us there should be someone with
enough savy to spare that he could share it with the rest of us?!

One more possibility. Suppose that a "spare" 1.4GB disk was available to
take a snapshot of any other disk in the system (<1.4GB). Is there a way
to use this disk to minimize downtime and do an unmounted dump of the "spare" instead of the disk that was shapshotted? I've thought of most of the arguments against it, so you need not discuss them. However, exceptional
ideas with features that counteract the negatives would be interesting.

        Daryl Crandall
        The Mitre Corporation
        daryl@mitre.org
        (703) 883-7278
#############################################################################
MY SUMMARY:

1) Use Sun SPARCserver Manager to mirror disk systems.

2) Use Legato's "Backpack"? software.

3) Use Delta Micro's "Budtool" software

4) Create a temporary file with 'touch' and check for it's existence
   during execution of /etc/rc.single then if it it exists, execute
   a customized backup script (several were offered) to perform the
   backup, remove the temp file, and complete boot to multi-user.
   Schedule dumps with crontab.

The following comments were noted:

        a) Is necessary to assure static file system.

        b) Is not necessary to assure static file system.

        c) Three level dump scheme to complex.

        d) System can be forced single user during dumps.

There seems to be a continuing difference of opinion for the necessity to
assure a static file system when doing a dump. Until some one with
demonstrable authority explains to me why it is not necessary to assure
a static file system during dump I'm going to continue to feel uneasy
if I dump a dynamic file system, especially level 0.

A two level dump scheme is OK but I think I'll stick to my three level
scheme. It gives me a set of weeklies from which I can quickly go back
several weeks at a time to locate something that was known to exist N number
of weeks ago but is missing now. Beyond about 6 weeks, most people can't
remember much detail anyway.

What should be created is a dump schedule scheme that permits two separate
ways of reconstructing the disks should any single tape be found unreadable.

I have not tested any of the scripts or techniques listed below. I have not
decided which technique we will use yet but suspect it will be point #4
above since I already have a script. All I need is a way to go to single user
and back again.

        Daryl Crandall
        The Mitre Corporation
        daryl@mitre.org

Thanks to:

dan@breeze.bellcore.com (Daniel Strick)
tgsmith@East.Sun.COM (Timothy G. Smith - Technical Consultant Sun Baltimore)
kaul@ee.eng.ohio-state.edu (Rich Kaul)
Eelco van Asperen <evas@cs.eur.nl>
P D Jowett <pdj@dcs.leeds.ac.uk>
jipping@frodo.cs.hope.edu (Mike Jipping)
randy@ncbi.nlm.nih.gov (Rand S. Huntzinger)
cdr@sachiko.acc.stolaf.edu
tony@Canada.Sun.COM (Tony Santos - Sys/Net Administrator Sun Toronto)
dfl@dretor.dciem.dnd.ca (Diane Luckmann)
Dan Butzer <butzer@cis.ohio-state.edu>
"Anthony A. Datri" <datri@concave.convex.com>
Steve Romig <romig@cis.ohio-state.edu>
fed!m1rcd00@uunet.uu.net
Mike Raffety <oconnor!miker@oddjob.uchicago.edu>
raw@beta.lanl.gov (Richard A. Wiley)
phillips@maui.Qualcomm.COM (Marc Phillips)
George A. Planansky <gplan@aer.com>
Bob Sutterfield <bob@morningstar.com>
gerry@jtsv16.jts.com (G. Roderick Singleton)
David Fetrow <fetrow@hardy.u.washington.edu>
vanandel@keel.atd.ucar.EDU
keves@meaddata.com (Brian Keves)
Ray Ballisti <ray@ifh.ethz.ch>
trr@lpi.liant.com (Terry Rasmussen)
#########################################################################
#########################################################################
EXCERPTED SUMMARY DETAIL FROM ALL RESPONDENTS:

There are a couple of problems to tackle:

1) Dump does not robustly handle live file systems

2) Dump's media management is pretty bad.
        - EOT is handled poorly
        - Tape errors are handled poorly
        - Volume management is lacking
        - The tape format is not very robust

#########################################################################
The best solution I know of that is currently available is Legato's
Networker package. Networker is not just a fancy wrapper around dump-
rather it is a new software package designed from the ground up.

There is also a package by Delta Micro called Budtool which I believe
is a wrapper around dump but also knows how to do scheduling and some
volume management. Budtool is probably limited by dump's weaknesses
though (ie dumping live file systems is dangerous).

#########################################################################
You might talk to Mark Verber (verber@pacific.mps.ohio-state.edu) for
his scripts. Basically he modified rc.single to check for a
/do.backups so that when the system was shutdown to single user mode
with a /do.backups file the backup would fire off, then bring the
system back up after it finished. Makes for a some rather nice,
unattended backups and it sounds like what you want.

#########################################################################
We do this by creating a file "/autobackup" with touch, rebooting the system with
"shutdown -r" and checking for the existence of "/autobackup" in the /etc/rc script.
If found, a backup-script is executed to do the dumps; when ready, the boot
sequence will be completed and the system will be in multi-user state again.
This is what we inserted in the /etc/rc file;

        .... (some stuff deleted)...
        if [ -f /etc/ld.so.cache ]; then
                #
                # Carefully delete ld.so cache in case it is corrupted.
                #
                mv /etc/ld.so.cache /etc/ld.so.cache-
                rm /etc/ld.so.cache-
        fi
        if [ -r /autobackup ]; then
                rm -f /autobackup
                echo Automatic backup in progress...
                if [ -f /etc/rc.backup ]; then
                        sh /etc/rc.backup
                fi
        fi
        if [ -r /fastboot ]; then
                rm -f /fastboot
        elif [ $1x = autobootx ]; then
                echo Automatic reboot in progress...
        else
                echo Multiuser startup in progress...
        fi
        date
        .... (more stuff deleted)...

The /etc/rc.backup script is quite straighforward; it reads the list of
fileystems to be backed up and issues a dump command for each of them.

        #! /bin/sh
        #
        # %W% %E% - auto backup
        #

        DUMPHOST="hathi:"

        DUMPTAB=/.dumptab # list of filesystems to backup
        BLOCKF=126 # the blocking factor
        BPI=54000 # tape-density (bpi)
        TAPELEN=6000 # length of tape (feet)
        TAPEDEV=${DUMPHOST}/dev/nrst1 # name of tape device

        PATH=/bin:/usr/bin
        export PATH

        cd /dev
        # read table and strip comments:
        cat $DUMPTAB | grep -v '^#' | \
        while read f mountpoint
        do
                # dump filesystem $f;
                echo " dumping: $f mounted on: $mountpoint"
                /etc/dump 0bdsfu $BLOCKF $BPI $TAPELEN $TAPEDEV $f
                if [ $? -ne 0 ] ; then
                        echo "Boop "
                        exit $?
                fi
        done
        echo "Beep "
        exit 0

Here is our current /.dumptab file:

        #
        # List of file systems to backup;
        # the first field of a line should contain the name of the raw device.
        # Fields are separated by TAB character(s).
        #
        rid000a /
        rid000d /usr
        rid001d /usr/local/share
        rid000g /var
        rid000e /home/kaa/sys
        rid003e /home/kaa/cs
        rid001e /home/kaa/edu
        rid002e /home/kaa/ect
        rid001f /export
        rid003f /export/exec/pc

For complete automation, you could add something like
        touch /autobackup; shutdown -r +10 "full backup"
to your root crontab file.

So, the only thing we have to do in the morning, is to rewind the tape and store it.
(The rewinding could be added to the script but the current version allows you to
store multiple backups on the same tape.)

As you can see, our scripts are very simple but they work ok for us. Hope this helps,

#########################################################################
We work this trick for a variety of suns, vaxes and odds & sods. Just include
a fragment such as

if [ -r /ok.to.do.exadump ]; then
    /usr/bin/ncp set executor state restricted
    rm -f /ok.to.do.exadump
    /etc/doexadump </dev/console >/dev/console 2>&1
    /usr/bin/ncp set executor state on
fi

in your rc.local. When you want a dump to happen you then create the flag
file (/ok.to.exadump in this case) and put a shutdown -r in for some
time that suits. Your dump script (/etc/doexadump in this case) then runs
during reboot. The only time this doesn't work is if the disk fails fsck and
hangs around waiting for you to arrive in the morning to fix it.

#########################################################################
We do a level 0 once a week on about 1.7 GB spread out on 5 disks across
the network. Between level 0's, we do level 5 dumps, concatenating the
level 5 dumps onto each other on a single tape. We have one 2.3 GB
Exabyte 8mm drive and replace the tape only after the level 0 dump (the
next morning). Everything else is automatic.

We use a (rather large) shell script. It allows a lot of flexibility
as to what gets dumped when -- it's very configurable. We do the dumps
at 2:00 am -- without going to single user. The shell script we use is
called "sundump" and is available for anonymous FTP on smaug.cs.hope.edu
(35.197.146.1) as "pub/sundump.tar.Z".

We've been doing this for almost a year now. The only drawback I have
seen is that user's must come to me to get their files restored. Unlike
the new "backup tools" (e.g., Legato Networker or Delta Microsystems'
BudTool) users cannot find their own files and request their own
restores.

#########################################################################
    What we do here is use cron to fire up a shell script sometime in the
middle of the night which does the following:

        1) Check for a "suppress dump" flag file - do nothing if found.
        2) Schedule a reboot using shutdown for some time later (say 6AM).
        3) Create a "trigger" file.

In the /etc/rc.local file we have (pretty much at the beginning) an if
statement which tests for the trigger file and fires off a copy of the
dump script if it's found. The trigger file is removed once the dump is
complete so subsequent reboots will do invoke dumps.

#########################################################################
   I have been using this technique for years writing either to a big empty
disk partition (in the older days) and to 8mm tape drives in more recent
times. Our scripts are pretty complex and tailored to our purposes. The
record the output of dump in log files, check to see if there is a tape in
the drive, send mail to the system administrators indicating whether the
job failed or not, etc. We only use this mechanism for full dumps on a
single machine, using it only for weekly (level 4) and daily (level 7)
incrementals on our bigger servers. [The full dumps on the larger servers
are two or three 8mm tapes each]. However, you could certainly schedule
fulls on different filesystems on different days, etc. if you like.

One caution - you may have tape failures. So to minimize downtime, I'd
schedule the dumps shortly before you (or whoever is in charge of backups)
come in to work. That way you can intervene as necessary to get the system
up before anybody complains...
#########################################################################
I would be most interested, though, in any responses you hear about
the Legato NetworkBackpack (I think that's the name) -- it's a commercial
directories, then do level 1s or 0s on the weekend to our 2.3GB tape.
package that automates backup... I'd rather buy a commercial package
than spend time developing and testing a set of backup scripts or such.

#########################################################################
Use your "spare" disk along with SPARCserver Metadisk driver.
This way you can you mirror one drive/partition giving you redundancy.
You can also take one of the drives offline transparently
to the users.

#########################################################################
Contact romig@cis.ohio-state.edu. He's writing what you asked for.

Basically, we have 25 servers and 6 Exabytes, 30+ GB of disks.
The software will cause the Sun to fasthalt itself on schedule.
Before going down it will write a backup flag file in the root FS.
The rc files will look for this upon reboot and then if they find it
they will run steve's backup software before the system goes multi
user. After the backup runs the system goes multiuser and is ready for
use.

You could obviously implement this easily yourself using rdump is you are
processing only 1 backup per night. Steve's stuff writes a database on
the tapes and us used to store multiple severs on each tape each night.
(We go down only saturday night. we stay up 24hrs/day the rest of the week.)

#########################################################################

That sounds like an unnecessarily complex dump scheme.

Why have two levels of incrementals? Just do 0's and any other one level.

Dump 1/7 of your filesystems at level 0 every night, then dump them all
every night at level 1 (or 5, or whatever).

Oh yeah -- I'm amazed that some people actually take their machines down
to do backups. We'd never be able to get away with it. We always dump
live, and it's yet to caus a problem.

#########################################################################

    I'm curious to see if you've handled the problem of running out of tape.
    Do you recalculate the "length" parameter for rdump when you are storing
    multiple partitions on a single tape or do you trust that it won't run out
    of tape? Your ideas on this would be instructive.

Currently we plan things out so that we shouldn't ever run out of
tape. The daily and weekly backups from our ~30 servers fot on 1 120
minute tape, and we could easily split that into 2 or more tapes if we
needed to. All of the full saves for each server fit on 1 tape each
with room to spare. So we don't do anything real smart about EOT yet.

I'm in the middle of adding proper EOT handling to BSD dump. Once
that's done, it should (I think) become available with the rest of the
BSD released code. Eventually we're going to add some smarts to the
backup software to allow for automatic tape rollover (eg, rather than
have dump ask me to mount tape 2, it'll automatically switch to a new
drive on a possibly different host that already has tape 2 mounted, so
that we can continue to do unattended backups in the wee hours of the
am...) But that's still a pipe dream...

#########################################################################

Thanks to everyone who has responded to my question about the Sun
Metadisk driver. The metadisk driver seems to be a working product
that does more good than harm, but one could imagine better. Better
in fact may be coming from Sun, something that I will look in to.
#########################################################################

Why not do the level zero dumps in multi-user mode? A simple cron
entry will do it then. We've done that for years, with many gigabytes
of disk space, with never a problem attributable to that.

It's just one huge pain to try to figure out how to come down single-user,
do the dumps, and come back up multi-user automatically ... don't bother.

Years ago, it was more important that dumps be done with a totally
quiescent filesystem; Sun and Berkeley have improved dump (and restore)
since then to be much more robust in this respect.

#########################################################################
        We have automated backups here. The length script of ours does
a level 9 each night, a level 0 the first saturday of the month, and a
level 0 on all the other saturdays of the month.

We have an 8mm drive backing up about 15 machines. Unfortunately, this
means that a level 0 takes about 12 gigs, or 6 8mm tapes, of which each
tape takes four hours to dump to. So, although the entire dump is automated,
you do have to have someone switching the tapes every four to five hours\
(so we span ours from saturday to sunday on a level 0).

All of the backups are done from a script which runs on our 4/490 which
has the 8mm exabyte drive on it. The only thing I am confused about is
you wanting to unmount partitions. The entire backup runs while all of
our systems are in use. Why would you want to unmount partitions????
#########################################################################

Daryl, we just do level 0 backups of our whole system over Monday &
Tuesday -- about 4 Gbytes -- in multiuser mode and tell
our users not to run anything on the affected partitions between 1 am
and 5 am . Every night we do level 5 backups of anything not on level
0; so we have a one week level zero cycle, at the risk/cost of those
morning hour multiuser modes.

Our biggest problem is the time it takes to do remote dumps. I wish
we had a second exabyte unit for peace of mind, also.

Since a scsi exabyte unit can be had for $2300 or less, I'd guess that
a second exabyte would be real attractive for your configuration, but
there's nothing like a power failure to concentrate the mind.

That's not much help to you, I know; I'll be interested in what you net
in your summary.

What are you getting for UPS units? The power company likes to brown
us out here on summer weekends.

#########################################################################

Look at Steve Romig's stuff in tut.cis.ohio-state.edu:pub/backup/*.

#########################################################################

I have an elderlyt suite that forces a system single user during dump(8)
seesions. I assume your're interested so have attached the description.

Let me know it you want more. (WARNING: you may have to hack a lot, I
can't tell)

README:

                        backup

This program should be run set-uid root. At genentech, we set the
permissions to 4550 with group operator. Backup was originally a
simple experiment to play around with yacc, it has proven to be
a very useful tool. Daily backups are normally performed using
the "-m" option, which allows the system to continue running. Without
this option, all running processes are sent "SIGHUP", any processes
remaining after the SIGHUP are SIGSTOPped for the duration of the
backup. After backup is complete, a SIGCONT is sent to all of the
stopped processes.

Some notes about exclusions and special cases:
        There is a list of processes to be excluded from either HUP or
        STOP in proclist.c. In addition, some programs such as daemons,
        etc. require special restarting. The table for this is in
        proclist.c also.

                                        10/2/87

and manpage:

.TH BACKUP 8
.SH NAME
backup \- perform tape backups
.SH SYNOPSIS
.B backup
[ options ] [ date ]
.SH DESCRIPTION
.vs -1p
.I backup
reads the file
.B /etc/backup_dates
and runs
.B dump(8)
to backup the specified filesystems. If no
.I date
is given, the current date is used.
The
.I backup_dates
file consists of lines of 6 fields each. The fields are separated by spaces
of tabs. The fields specify the day of the week (Sun-Sat), the week of the
month (1-5), the month (Jan-Dec), the dump level to perform (0-9), the
filesystem, and an arbitrary message to be delivered to the operator upon
completion of the dump. Each of these fields may contain a range or
a comma-separated list, or an asterisk meaning all legal values.
.PP
Options for
.I backup
are:
.TP
.B \-m
Run
.I backup
in multiuser mode. Usually backup will kill any forground processes
and suspend any background processes so that the filesystems will be
guaranteed to be quiescent. When run in multiuser mode, no action
is taken to insure that there is no filesystem activity.
.TP
.BI \-b blocksize
Sets the tape blocking factor to
.I blocksize.
The default is 32.
.TP
.BI \-s tapelength
Sets the assumed length of the tape to
.I tapelength.
This defines how many blocks are written to each tape. The default is
2300 feet.
.TP
.BI \-d density
Sets the tape density to
.I density.
This is normally 6250 BPI.
.TP
.BI \-t device
Sets the device to use to
.I device.
This is /dev/rmt9 on our systems which corresponds to the TU78 tape drives
at 6250 BPI.
.TP
.BI \-f file
Causes backup to read the filesystem dump schedule from the file
.I file
instead of from /etc/backup_dates.
.TP
.BI \-w minutes
Sets the warning time to give users when running in single-user mode.
.SH BACKUP_DATES Example
.nf
# Backup schedule for Genie - Genentech, Inc.
# Daily level 9's
Tue-Fri * * 9 /va "/va dump complete"
# Monday dumps - level 1's
Mon 2-5 * 1 / "root dump complete"
Mon 2,3,4,5 * 1 /va "/va dump complete"
# Monthly level 0's
Mon 1 * 0 / "root dump complete"
Mon 1 * 0 /va "/va dump complete"
.fi
.SH AUTHOR
Scooter Morris - Genentech, Inc.
.SH SEE ALSO
dump(8), shutdown(8), backup_dates(5)
.SH BUGS

#########################################################################

 One thing I've found useful is "compress". On a SPARCstation
using compress on tar (we use tar rather than undump) works out
rather well. It doesn't slow the tape down all that much (who
cares on overnight backups anyway) but does almost double my
tape capacity which halves the cycle time for complete backups.

#########################################################################

Here's the hack I use:
put an entry in root's crontab:
30 4 * * 1,2,3,4,5 /usr/local/adm/start.backup >/dev/null 2>&1
----------------------
This is just a simple script:
#!/bin/csh -f
# trigger the start of a single user backup
#
set path=( /bin /usr/bin /usr/ucb /etc /usr/etc )
set PN=`basename $0`
# might want -f to do fast reboot, but better to fsck disks before
# backing up
set shut_opt=-r

# DEFINE DELAY
set delay_in_mins=5
set delay_in_secs=`expr $delay_in_mins \* 60`
set ctlFile=/etc/trigger.backup

set msg_to_wall="$PN : single user full backup to start in $delay_in_mins minute
s"

echo $msg_to_wall | wall -a
date >$ctlFile
shutdown ${shut_opt} +${delay_in_mins} "reboot single user for full backup - log
off now"
------------------------------------

Now, just before the end of /etc/rc.single:
        # conditionally start a backup to EXB if the trigger file exists
        # (this file created by a cron job that runs in the early morning)
        # remove the trigger file BEFORE backing up, in case we have to
        # reboot the system to recover from any problems!
        if [ -r /etc/trigger.backup ]; then
                rm -f /etc/trigger.backup
                (echo "starting single user backup - please wait" >/dev/console)
                intr /etc/exb.backup > /dev/console 2>&1
        fi
-------------------------------------

Basically, the system reboots, and as it is coming up, sees that it should do
a backup before going multi- user. My script dumps to a local exabyte,
so I have it easy. You'd have to insure that the network is configured up, so
you can dump to that remote exabyte.

I'm also suprised that Sun doesn't give better advice on how to run clean,
automated, single user backups. Do they think we have nothing better to do
than boot systems single user at 2:00 AM ourselves? -:)
#########################################################################

I never try to do anything fancy for full dumps. I just leave the system
running normally and do a 0 dump late at night.

I have scripts that do all of my unattended backups. Here is my strategy:

1. Rotate 0 dumps on servers throughout the week.
2. Do 5 dumps of all systems each day, including the system(s) that had a
   0 dump today. This makes sure you get most of what the 0 dump missed.

This allows me to backup all of my systems over the network to one
backup tape each evening. The only human intervention is reviewing the
dump logs and changing the tape each day.

I use the 5 dumps only, since this will dump everything changed since
the last 0 dump. This makes is easier to recover everything without
leaving lots of "removed" files around and going through up to 7
incremental tapes. This can be VERY time consuming when you are talking
about fast forwarding through 7 Exabyte tapes.

If you would like I can mail you my scripts. They are very straight
forward, except for some finagling trying to figure out how much space I
still have on the Exabyte tape.
#########################################################################
------------------------------ Start of body part 1

Dear Daryl, as many of us, I am not a full time administrator, and as mostly everybody
I have little time do dedicate to the machines. So I send you my script
that I am using to make an automated dump of my servers ( one each night)
using a cron job. The idea is to have a quite file system, even with the
machine staying multiusers. I know that this is not THE solution, but it
works well in our environment (small community of users, mostly working
during the day. Only batch jobs are running in the night).
I am looking forward for some better ideas (your summary!).
Good luck. Ciao Ray

------------------------------ Start of body part 2

#!/bin/sh
#File name: sirius:/usr/local/share/src/dump_job
# or ~ray/src/dump_job or /dump_job
#Ray Ballisti, July 1990. Last modification: 31st May 1991
#Usage(example): at -s 3am Friday /dump_job
#This script stops running jobs, inhibits new login and logouts users,
#does a full 0-dump of the file systems and restarts the jobs.
#This script is supposed to be run from root and not by any user
#so it is not necessary to give it the set-uid permission.
#By any change to this rule one should consider to modify the searching
#for the jobs running below.
# Set safe search path (shell script runs as root)
PATH=/usr/bin:/usr/ucb:/usr/etc:/bin ; export PATH
# - - - - - - define some local parameters: start - - - - - - - - - -
# edit name of the Exabyte tape host (in our case <sirius>):
TAPE_HOST=sirius
HOSTNAME=`hostname`
#command name (set to default. Will be modified later if necessary):
COMMAND=/usr/etc/dump
#parameters:
# dump level 0; b:block factor=124 [local], 20 [network] ; f: dump-file
# u:update the dump record; n:notify (not set); s:size in feet of the tape
PARAM='0ubfs'
BLOCK=124
#tape unit:
UNIT=/dev/nrsmt0
#tape_lenght is 103400 for a P5-60 cartridge ==> 1.484 MBytes
# (which correspond approx. to a P6-90 one ==> 1.548 MBytes)
#tape lenght is approx. 155100 for a P5-90 cartridge ==>2.226 MB
# (which is little more than in a P6-120 cartridge ==> 2.044 MBytes)
#edit the right value here:
LENGHT=103400
# - - -******** <== to be edited !!
# - - - - - - define some local parameters: done - - - - - - - - - -
# - - - - - - Make some checks: start - - - - - - - - - - - - - - -
trap 'exit' 1 2
logfile=dump_report.$$
\rm -f $logfile
echo 'Dump on Exabyte. Procedure start at: ' `date` >/${logfile}
echo "Executed from machine $HOSTNAME . Tape host is $TAPE_HOST" >>/${logfile}
echo "Dump_job from machine $HOSTNAME . Tape host is $TAPE_HOST"
#
#check for tape unit and cartridge loaded:
echo -n 'checking tape-unit and cartridge: ' >>/${logfile}
if [ $HOSTNAME != $TAPE_HOST ] ; then
 # first check if Exabyte unit is free (file /usr/spool/locks/exabyte does
 # not exists) or if norewind is allowed (above file exists):
 #notice the <-n> option for rsh!(input from /dev/null)
 # Also the pipe is executed locally!
  numr=`rsh $TAPE_HOST -n cat /usr/spool/locks/exabyte | wc -l`
 # if numr is zero, then the command was succesful(cat of empty existing file)
 # then or the unit is busy or norewind is compulsory (for instance backup of
 # two different machines onto the same tape):
 if [ numr -eq 0 ] ; then
    echo "file /usr/spool/locks/exabyte exists: no rewind of tape"
  else
number=`rsh $TAPE_HOST -n mt -f $UNIT rewind | wc -l | sed 's/ *\([0-9][0-9]*\) *.*/\1/' `
        if [ number -ne 0 ] ; then
         echo 'Tape unit or cartridge missing. Abort' >>/${logfile}
         exit 1
        fi
  fi
# adjust for remote operation:
        UNIT=$TAPE_HOST:$UNIT
        COMMAND=/usr/etc/rdump
        BLOCK=20
else
  # check for lock file as above:
  if [ -f /usr/spool/locks/exabyte ] ; then
    echo "file /usr/spool/locks/exabyte exists: no rewind of tape"
  else
        mt -f $UNIT rewind
        if [ $? -ne 0 ] ; then
        echo 'Tape unit or cartridge missing. Abort' >>/${logfile}
        exit 1
        fi
   fi
fi
echo done >>/${logfile}
echo doing things to the system... >>/${logfile}
# no login allowed any more:
touch /etc/nologin >>/${logfile}
echo "further logins inhibited:"
echo "Check: `ls -l /etc/nologin` " >>/${logfile}
trap '\rm -f /etc/nologin ; echo "trap abort" >>/${logfile} ; exit' 1 2 4 10
#check for any user logged in:
echo -n 'checking for users logged in:' >>/${logfile}
who | grep -v -s "root"
        if [ $? -eq 0 ] ; then
         echo done >>/${logfile}
         echo '==> somebody is logged in . Simulate shutdown' >>/${logfile}
         /usr/etc/shutdown -k +5 'Automatic dump: down for ca. 1 hour' 2>/dev/null
         sleep 300
        else
         echo done >>/${logfile}
        fi
#do it anyway:
wall <<EOF 2>/dev/null
Last message from the automatic dump procedure
System down in 60 seconds. Please logout NOW.
System up again in ca. one hour.
EOF
        sleep 60
#
listu=/tmp/listu.$$
who | grep -v "root" |sed 's/^\([a-z][a-z]*\) *.*/\1/' >$listu
nrusers=`wc -l $listu | sed 's/ *\([0-9][0-9]*\) *.*/\1/' `
if [ $nrusers != 0 ] ; then
        echo 'The following users did not logout:' >>/${logfile}
        who >>/${logfile}
        echo 'Their login job will be killed now' >>/${logfile}
fi
\rm -f $listu
# do not trust "who" and recheck everything:
#find and kill rlogins:
for us in `ps aux | sed -n 's/[a-z][a-z]* *\([0-9][0-9]*\) *.*in\.rlogind/\1/p'`
do kill -9 $us ; echo "rlogin $us killed" >>/${logfile}
done
#consider telnet users:
for us in `ps aux | sed -n 's/[a-z][a-z]* *\([0-9][0-9]*\) *.*in\.telnetd/\1/p'`
do kill -9 $us ; echo "telnet job $us killed" >>/${logfile}
done
#consider ftp-users:
for us in `ps aux | sed -n 's/[a-z][a-z]* *\([0-9][0-9]*\) *.*in\.ftpd/\1/p'`
do kill -9 $us ; echo "ftp job $us killed" >>/${logfile}
done
#Now look for jobs running as <batch> or <at> jobs or whatsoever:
jobs=/tmp/job_numbers.$$
echo "checking for running jobs (other then system):" >>/${logfile}
        ps aux |sed -e /root/d -e /daemon/d -e /bin/d -e /ean/d \
      -e 's/^[a-z][a-z]* *\([0-9]*\).*/\1/p' -n | sort -n >$jobs
        nrjbs=`wc -l $jobs | sed 's/ *\([0-9][0-9]*\) *.*/\1/' `
if [ $nrjbs != 0 ] ; then
        echo "There are $nrjbs jobs running" >>/${logfile}
#jobs are stopped with increasing id-number (because of batch-jobs)
        for n in `cat $jobs`
        do echo stopping job no $n >>/${logfile}
        kill -STOP $n
        done
else
        echo 'no batch-jobs running in this moment' >>/${logfile}
fi
# check the atq queue:
# ... not implemented yet ...
# stop the batch daemon (to be restarted later) if present:
if [ -x /usr/local/lib/batchd ] ; then
idbatchd=`ps aux | grep batchd | sed -e '/grep/d' | \
          sed -n -e 's/root *\([0-9][0-9]*\) .*/\1/p' `
kill -STOP $idbatchd
fi
# - - - - - check diskless clients (if server with NIS): start
if [ -d /tftpboot -a -d /var/yp/`domainname` ] ; then
                lclient=/tmp/list_client.$$
ypcat bootparams | sed -e 's/^.*\/root\/\([a-z][a-zA-Z0-9_]*\).*/\1/' >$lclient
                for client in `cat $lclient`
                do if /usr/etc/ping $client 2 1>/dev/null
                        then
                        echo $client is alive >>/${logfile}
                # ... action ... rsh -l ray start_shell .?. perhaps..
                        echo going on anyway for now... >>/${logfile}
                          fi
                done
                \rm -f $lclient
fi
# - - - - - check diskless clients: done
#
# closing local ethernet port:
#for servers only ==> here sirius, ife0 and betelgeuze:
case $HOSTNAME in
        sirius| betelgeuze | ife0)
                  closeie0=`echo "ifconfig ie0 down " `
                echo "closing local ethernet port ie0:" >>/${logfile}
                echo "command is: $closeie0 " >>/${logfile}
                $closeie0 >>/${logfile}
                echo "done" >>/${logfile}
                 ;;
        *) echo "The ETHERNET port will NOT be closed" >>/${logfile}
                ;;
esac
# - - - - - - Make some checks: done - - - - - - - - - - - - - - -
# We suppose now that nobody is using the file sistem:
sync
sync
# you can dump all the partitions in /etc/fstab automatically
# or choose them out manually
#edit the dump commands below:
#for part in \
#`cat /etc/fstab |sed -e 's/^\(\/dev\/[sx][dy][0-4][a-h]\).*/\1/p' -n`
#do
# /usr/etc/fsck -p /dev/$part >>/${logfile}
#echo "dumping partition $part. Time is: " `date` >>/${logfile}
#$COMMAND $PARAM $BLOCK $UNIT $LENGHT /dev/r${part} >>/${logfile}
#done
# choosing the partitions manually:
partlist=
case $HOSTNAME in
        sirius) partlist="xd0h xd1h xd1f xd0g xd0a xd0f xd0e xd1e" ;;
        betelgeuze) partlist="sd0a sd0d sd2g sd0f sd0g" ;;
        mizar) partlist="sd0a sd2f sd3d sd0g " ;;
        regulus) partlist="sd0a sd0g" ;;
        algol) partlist="roota rooth rootg" ;;
        ife0) partlist="xd0a xd0f xd0d xd1g xd0g" ;;
        *) echo "please, customize yourself the partition list" >>/${logfile} ;;
esac
# start the main job: dump the file system
   for part in $partlist
   do
        # check the file system:
        /usr/etc/fsck -p /dev/$part >>/${logfile}
        if [ $? -ne 0 ] ; then
        echo "file system check failed: error= $? " >>/${logfile}
        echo "dump will be done anyway, but do not trust it" >>/${logfile}
        fi
        #
        echo "starting dump for /dev/r$part at " `date` >>/${logfile}
echo "command is: $COMMAND $PARAM $BLOCK $UNIT $LENGHT /dev/r${part} " >>/${logfile}
         $COMMAND $PARAM $BLOCK $UNIT $LENGHT /dev/r${part} >>/${logfile}
        echo "end dump for /dev/r$part at " `date` >>/${logfile}
   done
#
# -------------------------------------------------------------
# end of dump section. Now restore all:
# allows login again:
\rm -f /etc/nologin
# reopening local ethernet port for sirius , betelgeuze and ife0 only:
case $HOSTNAME in
        sirius|betelgeuze|ife0 )
                openie0=` echo $closeie0 | sed -e 's/down/up/' `
          echo "restarting ie0 with command: " >>/${logfile}
          echo "$openie0 " >>/${logfile}
                  $openie0 >>/${logfile}
                echo done >>/${logfile}
        ;;
        *) break ;;
esac
if [ $nrjbs != 0 ] ; then
        #restart jobs (batchs) in reverse order:
        for n in `sort -rn $jobs`
        do
        echo restarting job no $n >>/${logfile}
        kill -CONT $n
        done
fi
\rm -f $jobs
#restart batch-daemon:
if [ -x /usr/local/lib/batchd ] ; then
        echo -n "restarting batch daemon:" >>/${logfile}
        kill -CONT $idbatchd
        echo done >>/${logfile}
fi
# unload the cartridge but before reset the UNIT value:
echo 'Unloading the cartridge: please close the cartridge-unit`s door' >>/${logfile}
UNIT=/dev/rsmt0
if [ $HOSTNAME != $TAPE_HOST ] ; then
        rsh $TAPE_HOST mt -f $UNIT offline
        rsh $TAPE_HOST rm -f /usr/spool/locks/exabyte
else
        mt -f $UNIT offline
        \rm -f /usr/spool/locks/exabyte
fi
# end of script:
echo "procedure end at: " `date` >>/${logfile}
exit

------------------------------ Start of body part 3

------------------------------------------------------------------
Raymond Ballisti (Ray),
Electromagnetics Group, Swiss Federal Institute of Technology (ETH)
ETH-Zentrum, CH-8092 Zurich, Switzerland.
Phone: ++41 1 256 2753; Fax: ++41 1 261 1026
e-mail: EAN/ARPA/BITNET/INTERNET ---> ray@ifh.ethz.ch
------------------------------------------------------------------

------------------------------ End of body part 3
#########################################################################

If you are using the Delta Microsystems drivers and utilities
you could use "budtool" or if you are stuck with character
based monitors (as am I) I have some shell scripts that backup
our 7.5GB of system (that we are interrested in) out to 4
8mm tapes. The scheme is as follows (we have two 8mm drives that
are at the same firmware and software level):

 Saturday Morning two level 0 dump sets are done using remote
 processes (so the machines have to be up and networked) of
 dump piped in to the bdd comand which then writes the tape.
 Prior to the backups actually running the three following steps
 are done: 1) The batch queues are shutoff.
            2) All users are logged off and /etc/nologin is touched
               on all machines.
            3) All "idle" non-system processes are killed.

 When the backups are done the batch queues are re-started and
 /etc/nologin is removed on all systems.

 The same thing is done on Sunday or Friday when I am going on a
 vacation.

 During the week a level 1 incremental is done to all file systems
 using the same tape, in this way to restore a file system all I have
 to do is two restores, the las tfull level 0 backup's restore and
 the last level 1 incremental backup's restore.

The tapes produced in this way have been very reliable for the last
two years. I've been able to boot machines where the system disk
has been trashed by booting the machine as a diskless client mounting
the partitions each on /mnt (after the surface scan and format of
course) to restore the root and usr file systems run intallboot and
have a working system with a miniumum of muss and fuss.

We don't have any Sun provided hardware or software for 8mm drives,
we only have Delta Microsystems external SCSI drives and software,
so I can tell what the Sun provided software is like. My scripts
rely on the Delta Microsystems bdd and rwtoc and mts commands to run.

What we are using now is all written in bourne shell.

If you are interrested let me know and I will mail you the script
and it's support files, but if you don't have the Delta Microsystems
software installed then the scripts will probably be of very little
use to you. Either way I would suggest using budtool if you can, it
looks neat (when I wrote the earlier version of the script budtool
was not yet available.) The only reason we are not currently using
budtool (which requires the user to be running sunview) is that the
8mm drives all hooked up to file servers using character monitors
in vt100 mode.

Either way, lots of luck braving the storms. Hopefully those UPS's
aren't too far off in the future.
#########################################################################
#########################################################################



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:15 CDT