SUMMARY: War stories wanted: 2.3 5/94 auto-install vs. dataless ...

From: Greg Earle (earle@isolar.Tujunga.CA.US)
Date: Wed Aug 24 1994 - 22:57:39 CDT


[Warning: long. Hit "n" now if not interested in gory details ... ]

In article <9408170039.AA29244@isolar.tujunga.ca.us>,
I <earle@isolar.Tujunga.CA.US> wrote:
>Recently a group I work for received a shipment of 20 SPARCstation-20/50SX's.
>Taking the plunge, we decided to try and learn how to do auto-installs for
>all of them, preferably the Right Way. We have a SPARCCenter 1000 for use as
>the auto-install server.
...
>Anyway, all of the systems come more-or-less identical with the 535 Mb disk
>option and 32 Mb. We thought it over and came up with the following scheme:
>
> - configure as a "dataless" system
> - 24 Mb root, 64 Mb swap, 43 Mb /var, 100 Mb /cache for CacheFS
> - remainder of disk for /scr (local scratch space, requested by all)
>
>We've gotten to the point where we can boot a test mule and it comes up all
>the way, but we're not really happy with it. I'm not sure if having a
>dataless setup is the right way to go at this point. At first it seemed to be
>obvious, because it frees up ~75 Mb that can be used for cache and/or scratch
>space, and with only one /usr to administer, it'd be a big administrative win.
>
>But this setup seems to be problematic. Here are some of the issues we've
>faced, and I was wondering if anyone else has scaled these same walls. We
>have to decide which way to go with these installs by the end of the week, so
>it's kind of urgent. Any "war stories" would be appreciated.
>
>... [etc.] ...

Well, as could be expected, the responses were rather diverse in their
suggestions. First off, I'd like the thank the respondents:

Jim Davis <jdavis@cs.arizona.edu>
worsham@aer.com (Robert D. Worsham)
sjk@snowleopard.KaPRE.COM (Scott Kamin)
steve@cegelecproj.co.uk (Steve_Kilbane)
John DiMarco <jdd@cdf.toronto.edu>
Ruth Milner <rmilner@aoc.nrao.edu>
Shane.Sigler@Corp.Sun.COM (Shane Sigler)
Davin Milun <milun@cs.Buffalo.EDU>
annr@reference.collins.co.uk (Ann Rautenbach)
ccwong@se.cuhk.hk (Steven Wong)

Some people recommended that we go dataless as described above; others
suggested we avoid the hassle and put a small local /usr on each system to
avoid a lot of (note: "a lot of", not "all") these patches/packages headaches.

We're still not completely done (we got an extension <*grin*>), but at this
point we basically "chickened out" and decided to put a local /usr on each
machine. We made it 75 Mb (actual used at this point is 55) and made up for
it by lowering the size of our CacheFS partition, based on input from Shane
Sigler.

Having a local /usr does simplify things in a lot of ways, and we were able
to integrate the Maintenance Supplement 1 CD-ROM patches into our auto-install
setup (based on Casper's) reasonably well. We're about 98% of the way to our
goal of being able to turn the machine on, and an hour or so later have it up
and running, ready for use.

However, we have still run into a few gotcha's that are keeping us from
getting to 100%. Mostly they have to do with patch installation.

I was under the impression that installing a patch on a "client" system would
install whatever it needed, and know not to install (or try to install) parts
on a server if patch files are split across them. It seems like this is not
the case. If we've done something wrong, please enlighten me.

We're hard-mounting /opt and automounting /usr/openwin from a SPARCserver 1000.
The Maintenance Supplement 1 patches were previously applied to the server.

This caused 4 of the MS1 patches to croak when the client tried to install
them during auto-install:

------------------------------------------------------------------------------
Installing 101262-04...

Installpatch Version 3.8 1/24/94
Generating list of files to be patched...
The following validation error was found:

ERROR: /opt/SUNWits/Graphics-sw/xgl-3.0/lib/pipelines/xglSUNWcfb.so.3
    file size <62684> expected <62780> actual
    file cksum <15197> expected <17472> actual
... etc. ...

Installing 101381-02...

@(#) installpatch 4.4 94/03/10
Executing prepatch script...
This patch is not supported on this system.
The prepatch script exited with return code 1.
Installpatch is terminating.

Installing 101471-01...

The following validation error was found:

ERROR: /usr/openwin/lib/libPEX5.so.2
    file size <893492> expected <895304> actual
    file cksum <36834> expected <33564> actual

Installing 101594-01...

Installpatch Version 3.8 1/24/94

The following validation error was found:

ERROR: /opt/SUNWits/Graphics-sw/xgl-3.0/lib/pipelines/xglSUNWsx.so.3
    file size <730936> expected <752932> actual
    file cksum <43285> expected <6333> actual
... etc. ...
------------------------------------------------------------------------------

101262-04, the XGL Jumbo, didn't install because all of the files are in /opt
and were already patched on the server.

101381-02 is an 8-bit patch for (US) Domestic Solaris (OW support). It failed
because it wanted to put stuff in /usr/lib, but it also wanted to add stuff to
/usr/openwin/lib and /usr/openwin/share/locale. Since the latter places were
already patched on the server, it croaked - and the /usr/lib/locale/en_US
additions did not get made.

101471-01 is the Direct PEXlib 2.1 Jumbo. It only patches libPEX5.so.2 in
/usr/openwin/lib, which again was already patched on the server.

101594-01 (and in a later attempt, 101594-06) is the SX Jumbo. It didn't go
because all the files save one (/usr/lib/libsx.so.1) are in /opt/SUNWits,
which was already patched on the server. The later version, 101594-06,
introduces another file (ddx module) that goes in /usr/openwin, so again that
file was already patched on the server.

This bit us particularly nastily in the case of 101594-06, and we ended up
finding a bug in the Xsun server because of it:

Because the client saw all the files (save one) patched on the server, it
refused/failed to install any of the files in the 101594-06 patch. This rev
contains a module, /usr/openwin/server/modules/ddxSUNWcg14.so.1, which now
contains unresolved references to "sl_ctx_unspamify" and "sl_ctx_spamify".

These symbols are defined in the latest /usr/lib/libsx.so.1 that comes with
that patch. But since the patch didn't install, I was left with a patched
server w/ this new ddxSUNWcg14.so.1 module, and the original libsx.so.1.

Rebooted the system, and "xdm" failed to bring up the server - an endless
stream of "NFS Write Error" messages on the console. "Why is it trying to do
an NFS *write*?", I wondered. (/usr/openwin is mounted r/o) I turn on packet
tracing, and it turns out that it was trying to *write* to the rgb database!
(rgb.dir) Now *that* made no sense, so I turned on r/w perms on /usr/openwin
and rebooted. Still no xdm, but no more NFS Write Error messages. Look at
rgb.dir on the server and it's trashed. The contents? Would you believe the
ld.so.1 error message from trying to load this new ddxSUNWcg14.so.1 with
the unresolved symbols? Yep - Xsun opened the "rgb.dir" file *read/write*
(big no-no) and then dups the fd onto stdout(!) Along comes an error message
while the server is coming up (like loading in a shared object with
now-unresolved symbols, fer instance), and it writes those error messages to
stdout. And guess where those try to go ... *sigh*. I spun my wheels for a
good half-(man-)day tracking this down. Arrggh.

This leads me to my remaining questions (kudos if you've read this far (-:):

WHY doesn't "installpatch" recognize cases where all (or part) of a patch has
been previously applied? In the case of 101471-01, couldn't it see that the
sizes/chksums were different, but were the same as the file it was about to
install, and figure out that the patch had already been applied?

In the case of 101381-02 and 101594-01, again, couldn't it figure out that
the patch had already been partially applied, and at least apply the parts of
it that go to the local disk? Or do we have to allow each client to have
root r/w access to both /usr/openwin and /opt on the server, so that each of
these already-patched files can be repatched, again and again? But given that
"installpatch" complained about chksums and file sizes, how would that work
anyway? As soon as you patch it the first time, from then on, all the clients
would get cksum/size mismatch errors ... please tell me I'm missing something
obvious ...

Again, thanks to all the respondents for their input, and thanks again in
advance if anyone has any insights to the mixed-mounting patch scenarios
painted above.

-- 
- Greg Earle                    WWW: http://www-mipl.jpl.nasa.gov/~earle/
  Phone: (818) 353-8695                 FAX: (818) 353-1877 [Call # again if
  Internet: earle@isolar.Tujunga.CA.US			     you get !FAX tone]
  UUCP: isolar!earle@elroy.JPL.NASA.GOV a.k.a. ..!{ames,usc}!elroy!isolar!earle



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:08 CDT