SUMMARY: How does one "unblock" logins when mailhost goes down?

From: Alek O. Komarnitsky (alek@spatial.com)
Date: Mon Nov 04 1991 - 05:21:50 CST


[The original post]
I suspect this is either a real easy question, or a gnarly problem that
isn't easily solved. I appreciate comments/suggestions either way.

Like many sites, we depend on NFS. For instance, /usr/local is mounted
from a central file-server on a number of different machines to provide
localization. We mount it with options: bg, intr. That way, if our
file server goes down, one can type cntl-C to interupt things. Our setup
is actually a bit more complex than that, including switchover (via symlinks)
to a backup server upon detection of the primary server failing. I believe
I've got that problem licked in case of disaster.
[We've used the automounter for quite some time, and I've had it as an
action item to use amd to solve this problem in a better way for a while.
I finally got around to researching it this weekend, and it looks good]

However, the file-server is also our mailhost. I.e. /var/spool/mail is
mounted on the client Sun's (but not the other machines). The problem here
is that if mailhost goes down, one can not login ... even though that
filesystem is also mounted: intr,bg. The login process just hangs despite
cntl-C, cntl-\, etc. I believe the hang is during the mail -e by login.

Is there something that I and/or the user can do to get around this?
I'd like them to be able to login - not being able to read their E-mail
is OK for the moment. Am I missing something obvious here?

Several people recommended that all user's have a .hushlogin file in their
home directory. While that will certainly work, user's like to see the
"new mail" message (along with motd and the other login stuff), and it's also
difficult to get *all* user's to change (i.e. I was looking for a system
level change). It was also suggested to unset mail in the .cshrc file,
similar concerns as noted above. Regardless, mucho thanx to the following
people who made suggestions along these lines:
Tim Becker <becker@cs.rochester.edu>
Jay Plett <jay@silence.princeton.nj.us>
etnibsd!vsh@uunet.UU.NET (Steve Harris)
alison@c255.ucsf.EDU (ABoeckmann)
Gerald Justice <justice@dao.nrc.ca>
Jon Peatfield (on kronos) <jp107@amtp.cam.ac.uk>

pbh@CFSMO.Honeywell.COM (Paul Henninger) wrote:
to increase robustness, i am using the automounter to mount upon
demand a user's mail directory. i do this by running, from a global
.cshrc file, these lines:

set mailbox_host = `ypmatch $USER aliases | awk -F"@" '{ print $2 }'`
if ($mailbox_host != "") then
   setenv MAIL /mnt/$mailbox_host/var/spool/mail/$USER
endif
unset mailbox_host

   the format of my aliases map is "USER USER@MACHINE".

   this method allows me to direct a user's mail to any machine (or
all user's mail to the same machine) and allows the user to log into
any machine and read their mail. if the machine handling a user's
mail is down, the mount will timeout instead of hang.

   could you please give me some more information on how you have used
symlinks to switchover in case of a server going down? i am always
looking for good ideas on disaster avoidance and recovery. thanks.
[Pls see my comments at the end about amd. Also, we don't usually mount
a directory "where" it's going to be used, but rather on another mount point,
and then symlink to that. One can then "switch" the symlink if needed.
amd basically does this in a very smart & robust way. Note, of course, that
if a process is using a file when the server goes down, you're out of luck.
However, new accesses shouldn't block]

John DiMarco <jdd@db.toronto.edu> wrote:
In list.sun-managers you write:
>I suspect this is either a real easy question, or a gnarly problem that
>isn't easily solved. I appreciate comments/suggestions either way.

It's gnarly, if you don't have access to source.

>However, the file-server is also our mailhost. I.e. /var/spool/mail is
>mounted on the client Sun's (but not the other machines). The problem here
>is that if mailhost goes down, one can not login ... even though that
>filesystem is also mounted: intr,bg. The login process just hangs despite
>cntl-C, cntl-\, etc. I believe the hang is during the mail -e by login.

Yup. Either patch login, or patch mail to check if the nfs server is up
before checking the mailbox. We've patched login, since we want to allow
logins even when the home directory server is down.

chron!magic706!sysnmc@uunet.UU.NET (Matt Cohen) wrote:
        Take a look at AMD, a public-domain automounter that will be part of
4.4BSD. It transparently handles server failures, multiple backup servers,
and much, much more. It's used by many large sites (including ours) on hundreds
of machines. AMD is available from usc.edu:pub/amd.
[Comment from Alek: amd is a *very* nice program - we use it here on several
types of machines]

jdd@db.toronto.edu wrote:
Here's what I suggest: get the BSD 4.3 source for /usr/ucb/Mail, and compile
that. Hack the code for the "-e" option to check to see if the NFS server is
up before checking /var/spool/mail. You can do this by attempting to
make an RPC connection to the NFS port on the server.

Piete.Brooks@cl.cam.ac.uk wrote:
* I changed the login source to fork before looking for mail.
* If the child doesn't return within a reasonable interval, the parent
* continues ...
* [ Code is compilcated by the fact that we have a mix of having mail in a
* central directory (/usr/spool/mail) and in home directories (~/.mail) ]
[Comment from Alek: pls E-mail Piete if you're interested in the source]

gdmr@dcs.edinburgh.ac.uk wrote:
We ditched sendmail in favour of MMDF and now deliver mail directly to
users' home directories, for precisely that reason. At the???st count we had
about 20 NFS servers, so there was adequate dedundancy for most things, but
not mail.

das@ee.edinburgh.ac.uk wrote:
Shock, horror we mount our mail partition SOFT and using the
automounter. I have been doing this
for over a year and have not had any problems and you can login if
the mail server disappears.

Extract from /etc/auto.import (indirect map)

mail -rw,noquota,retrans=12,soft beam-$NET:/usr/spool/mail

the mount point is /import/mail (actually /tmp_mnt/import/mail)
there is a soft link from /var/spool/mail to /import/mail.

poffen@sj.ate.slb.com (Russ Poffenberger) & todd@petadmin.wustl.edu (M. Todd Gamble)
pointed out that mounting /var/spool/mail with -o soft,intr should do the job.
This is what I ended up doing (well, I thought I had tried this and it hadn't
worked, but I'm getting senile in my old age :-). I also add the bg option, so
if mailhost (the alias I use) is down, then I can eventually pick up the mount.

We use the automounter quite a bit here, and I think several of the suggestions
in that direction are excellent ones. I have <50 users, so I can probably get
away with the above (plus several people are on non-Sun's, to which we forward
the E-mail directly to their machines). If I doubled in size, then I would
probably go with the automount approach, with perhaps some backup mail-servers
that are rdist'ed periodically (amd allows an NFS client to detect the failure
of a server, and mount a backup over that mount point).

Nobody mentioned any problems with a single E-mail server, although I recall
problems in the past, primarily with lockd (?)

Thanx again for all the responses, and I hope my summary is useful,

Alek Komarnitsky 303-449-0649
Software Tools Manager, Spatial Technology, Inc. 2425 55th Street, Bldg A
alek@spatial.com Boulder, CO 80301-5704

P.S. I suspect this problem, and how it scales, has been discussed at the
     LISA conference - becomes more difficult with thousands of users, eh?



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:17 CDT