SUMMARY (preliminary) + acks. Re: cron dies infrequently

From: Patrick Gosling (jpmg@eng.cam.ac.uk)
Date: Sat Mar 21 1992 - 19:39:50 CST


My original question:
==========================================================================
Configuration: a variety of sun4's (ss1, ss2, 4/330) running SunOS 4.1.1 .

At irregular intervals cron will spontaneously die on one of our machines,
leaving a core dump in /var/spool/cron/atjobs . This is about all the useful
information I can give I'm afraid, as I cannot correlate it with a particular
machine, particular hardware/software configuration, or just about anything
else, except for the fact that someone has to have scheduled an "at" job for
it to happen (I believe).

Could it be a problem with at, the automounter, and users explicitly
referring to /tmp_mnt/home/machine rather than /home/machine ?

I would be very interested if anyone else had seen this, and (obviously)
delighted if someone knew of a patch/workaround. It's happening rarely
enough to be possible to cope with, but frequently enough to irritate the
users who use "at" ...
===========================================================================

First I should apologise for the slight delay in replying to the net - I've
been snowed under for the last three weeks. Further, I'm afraid I don't
have a conclusive solution yet, but if I do manage to do the last few bits
of tracking down, and install something that works on our kit, I'll post
another summary.

Replies were received from
==========================
pek@au.edu.canberra.longinus (Peter Kenne)
Fletcher Mattox <fletcher@edu.utexas.cs>
Dieter Muller <dworkin@com.rootgroup.merlin>
B.C.Hamshere@uk.ac.ncl
unruh@ca.ubc.physics (William Unruh [Unruh])
=========================
to whom many thanks. (I think a JANET mail ordering has crept in there -
sorry).

Peter Kenne, Fletcher Mattox and Bill Unruh all indicated that they had
seen this occur - the first two didn't run the automounter, and I think they
were all running 4.1.1 .

B.C.Hamshere(@uk.ac.ncl) reminded me of the problem with the automounter and
'at', caused by implicit references to automounted directories, which can
cause 'at' jobs to fail. I don't think this is what is causing cron to fall
over, but he sent me a script that deals with the 'at' problem that I will
be happy to forward on to anyone who is interested.

Finally, and I suspect closest to the mark, was the following reply from
Dieter Muller, which i have left mostly un-edited (apart from stripping out
most of the duplication of my original message) -

----------------------
: [my stuff deleted]
: else, except for the fact that someone has to have scheduled an "at" job for
: it to happen (I believe).

I suspect you'll find it doesn't even correlate with that, if it's the
bug I think it is.

: [my stuff deleted]

There's a case in which cron closes a stdio FILE that it hasn't
opened. This sometimes does nothing, and sometimes causes a core dump
when various uninitialized pointers don't look like they're NULL, and
so they get free(3)'d. I believe Sun has a patch for this available,
which may also be under title to the effect of ``cron runs some jobs
twice.''

Or you could steal cron off of a Solbourne, I fixed it in there long
before 4.1, and carried the fix over. I'm not at Solbourne any more,
so I can make suggestions like that ;-)
----------------------

This looks like the best bet - I haven't managed to track down the relevant
patch number yet, but as I said, I will re-summarise if I manage to and this
seems to fix the problem. Given the intermittent nature of the problem, it
may be some time til it seems reasonable to do this.

Thanks again to all who replied,

-patrick.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:39 CDT