SUMMARY: Weird script behaviour in Enterprise 4000

From: Sun Manager Account (sunmanager@argos.silvac.pt)
Date: Wed Jan 15 1997 - 10:54:12 CST


Here is the summary to the question I posted a a couple of days ago...

The feedback from Jim Harmon <jim@telecnnct.com> makes a LOT of sense to me and,
in my opinion, would clearly explain why the script ends up not doing what it
was supposed to do.

Please check his message, which is included just after my original message.

All in all I got four replies, which are included below.

This is my original message:

> From sun-managers-relay@ra.mcs.anl.gov Mon Jan 13 19:34 GMT 1997
> Date: Mon, 13 Jan 1997 18:13:39 GMT
> From: Sun Manager Account <sunmanager@argos.silvac.pt>
> To: sun-managers@ra.mcs.anl.gov
> Subject: Weird script behaviour in Enterprise 4000
>
> Hello SMs,
>
> Can somebody have a look at this and please explain to me what is going on here?
>
> I have made a very simple shell script that does something similar to what is shown below:
>
> while (some_condition) do
> touch file[var]
> rm file[var]
> increment [var]
> done
>
> This can also be done with a braindead shell script that goes something like:
>
> touch file1
> rm file1
> [repeat the two lines above X number of times, substituting "file1" for "file2", etc, etc.
>
>
> If you execute one of these scripts inside an empty directory, after it finishes running
> you should end up with the same empty directory you started with.
> The problem is that sometimes, depending on the load of the server at the time the script was
> running, some of the files created by the script will be left in the directory.
>
> We are testing this on a Sun Enterprise 4000 with 6 CPUs and 1Gbyte RAM...
>
> The only thing that occurs to me to explain this behaviour is that sometimes the remove
> command will be executed and fail because the system still is in the process of creating
> the file.
>
> Could this be due to the fact that the system is MultiProcessor, and could therefore
> be executing each command on a different CPU, which would lead, depending on load conditions,
> to the behaviour described above?
>
>
> Thanks in advance for any help,
>
> Fernando Dias
>

These are the replies I got. I have added my comments to clarify or correct some of them.

> From jim@telecnnct.com Mon Jan 13 21:04 GMT 1997
> Date: Mon, 13 Jan 1997 15:48:34 -0500
> From: Jim Harmon <jim@telecnnct.com>
> Mime-Version: 1.0
> To: Sun Manager Account <sunmanager@argos.silvac.pt>
> Subject: Re: Weird script behaviour in Enterprise 4000
> Content-Transfer-Encoding: 7bit
>
> In scripts, each command spawns a child to execute the process
> requested.
>
> So, in your example below, if the touch process goes to sleep before
> completion (due to system use--for example) and the rm process begins
> execution,but doesn't go to sleep, you'll have the two processes
> reversed in execution, leaving the touched file after the remove.
>
> The way to control that would be to write the script so that the rm
> can't execute UNTIL the touch is returned "complete".
>
> One way to do that would be to put an "if" statement around the rm, that
> checks to see if the file is found, and if not, repeats until it finds
> it.
>
> Let me know if you would like some suggestions on how to build the loop,
> I'm afraid that right now I'm trying to get a bunch of things done...
>
> Cool?
>
>

As I have said before I think Jim's reply makes a lot of sense and would justify
the script behaviour.

> From afinkel@pfn.com Mon Jan 13 20:04 GMT 1997
> X-Sender: afinkel@sunrah.pfn.com
> Date: Mon, 13 Jan 1997 14:54:57 -0500
> To: Sun Manager Account <sunmanager@argos.silvac.pt>
> From: Alex Finkel <afinkel@pfn.com>
> Subject: Re: Weird script behaviour in Enterprise 4000
> Mime-Version: 1.0
>
>
> You could try using the wait command in the script which will cause the
> shell to wait for the completion of previous jobs.
>
> The man page for wait(1) gives a good description .
>
> - Alex
>

>From the manual pages it looks like the wait(1) only waits for completion of jobs
that are run in the BACKGROUND. This is not the case with my script as each command
issued is run in the foreground.

> From rsk@itw.com Tue Jan 14 01:04 GMT 1997
> From: Rich Kulawiec <rsk@itw.com>
> Subject: Re: Weird script behaviour in Enterprise 4000
> To: sunmanager@argos.silvac.pt
> Date: Mon, 13 Jan 1997 19:28:51 -0500 (EST)
> X-Last-River: Brandywine
> X-Last-CD: The Nields, "Abigail"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
>
>
> That really shouldn't happen, because once the "touch" is executed, the file should exist.
>
> Have you run this with "sh -x" and or "sh -xv" to ensure that the [var] is getting updated
> correctly and in a matching fashion for each command that's run inside the script?
>
> >Could this be due to the fact that the system is MultiProcessor, and could therefore
> >be executing each command on a different CPU, which would lead, depending on load conditions,
> >to the behaviour described above?
>
> It darn well had not better not be, because if it is, it's an indication of a serious
> bug in Sun's multiprocessor implementation!
>
> ---Rsk
> Rich Kulawiec
> rsk@itw.com
>

And the last reply:

> From jacques.rall@za.eds.com Tue Jan 14 13:05 GMT 1997
> From: Jacques Rall <jacques.rall@za.eds.com>
> To: "'Sun Manager Account'" <sunmanager@argos.silvac.pt>
> Subject: RE: Weird script behaviour in Enterprise 4000
> Date: Tue, 14 Jan 1997 14:57:32 +0200
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
>
> The file is propably only in memory yet and didn't have time to do a
> 'sync'.
>

>From what I know the system 'syncs' by default every 60 seconds. If the files
residing only in memory could not be seen by other processes until each 'sync'
run we would get a lot of problems regarding interactivity and coordination
among processes. (e.g. a user writing a file to disk and having to wait until
the next 'sync' to be able to do further work on it)

Am I making sense here?

Regards to you all, and thanks for the help!

Fernando Dias



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:43 CDT