SUMMARY: non-problems (re-post)

From: Daryl Crandall (daryl@oceanus.mitre.org)
Date: Mon Jun 03 1991 - 15:28:29 CDT


Sun-managers,

OK, I goofed again. Some of you did not receive the complete summary because
I inserted some lines with a "period" at the beginning of the line. This terminates the message early in some mailers. I've removed the offending
lines and am reposting it. You should see a SUMMARY and a CONCLUSION followed
by my signature.

OK here's the summary of the replies to my "non-problems" question. I've
included the original question. The SUMMARY and CONCLUSION follows the
original question:

#############################################################################
ORIGINAL QUESTION:

As an unusual request, I have a question about Sun administration that
involves doing nothing.

I have read in this news group about people installing all sorts of various patches that seem desperately needed.

I administer a system of 45 very heavily used machines and have yet to install
a patch to SunOS-4.1 or SunOS-4.1.1 on Sun3, Sun4, Sun4c architechure workstations.

Am I lucky, or are these patches so specialized that they are rarely needed?

I have the following configuration:

        ~40 diskless clients (3/50, 3/60, 3,110, 4/110, 4/20, SPARCstation1,2)
        4 servers (3/180, 4/280, 4/390, 4/390).
        Xylogics 451 and 753 disk contollers
        Xylogics 9-track tape controller
        30Mb to 1.4Gb SCSI disks on various servers
        10GB of Fujitsu M2382K 1.0GB SMD disks
        ALM 1 (Systek MTI 800/1600) serial board.
        Exebyte 8mm tape drives, QIC24, QIC150, and 9-track tape drives.
        Non-Sun CD-ROM drive and driver.

We run the following software:

        TeX, X11R4, OpenWindows2.0, SunView, "The Publisher", Matlab,
        Mathematica, ILS, VTI, VLSI, WorkView, TranScript, FORTRAN,
        g++ & other GNU stuff,

We use the audio features of SPARCstations.

We've got several special circuit boards for tv cameras, array processors
digital signal processors, GPIB controllers, GX frame buffers.

We beat the pudding out of the servers forcing some of them to double as
compute servers as well as disk servers.

One server alone has 24 diskless clients.

What am I doing right?

What am I overlooking?

Why does my policy of not installing patches until the necessity is absolutely
demonstrated work so well?

Yes, we do have an occasional crash of a server or a client but it can usually
be atributed to pilot error, accidents, or power problems.

If I really have a disaster in the making, I'd like to know about it. Please
if there is some imminently needed patch for SunOS-4.1.1 then I'd appreciate
a timely warning. I'll summarize.

I'd also like to hear from other administrators that are running without
problems and patches. A ratio of positive/negative responses might be interesting. With my machines running so well (knock on wood) I find it
difficult to appreciate the negative comments about Sun software and/or
hardware.

NOTE: I actually DID install one patch. I used adb to adjust the 'nbuf'
       variable in /vmunix to get better NFS performance. And I DO create
       customized kernals for low memory machines, and machines with special,
       non-standard drivers.

Why don't I "have a fly in my soup" too? :-)

I do want to indicate that I have installed security fixes when they are
identified by CERT. Security fixes are a "demonstrated need" and so are
installed automatically. I don't consider them in the same class as bug fixes.

        Daryl Crandall
        The Mitre Corporation
        daryl@mitre.org
        (703) 883-7278

#############################################################################
SUMMARY:

Approximately 68 messages were received. Most comments were similar to
this "if it works, don't fix it".

Many people indicated that they DO install the security patches from CERT
however some effort should be expended to make sure the notice really
comes from CERT and is not a malicious forgery.

A few people mentioned that they knew of people who fanatically installed
patches, but I received no messages from the fanatics themselves. They must
be too busy installing patches to answer mail :-)

A few people said I must be lucky to have such a well running system.
(I'd like to think my methods had something to do with it. :-)

The patches that are installed usually relate to performance enhancements or
bugs on the following systems: NOTE: this is not a check list of problems.
These are only summaries of problems that I was able to relate to various
systems from the often sketchy information provided by the respondents.
Remember that only about 6 people reported specific problems related to
known bugs and patches. About 90 percent of the respondents do NOT install
patches except security fixes.

        SunOS-3.5:
                no reports of specific bugs

        SunOS-4.0.1:
                patch city! move to 4.0.3 at least!
                serial I/O problems get the Sun sanctioned patches.

        SunOS-4.0.3:
                a few people indicated that they needed the JUMBO patch

        SunOS-4.1:
                no reports of specific bugs

        SunOS-4.1.1:

                Fortran 1.3.1 - to fix problems with writing large records

                TMPFS - to fix problems when compiling data structures in
                fortran

                NFS - to fix problems with hanging mounts using automount

                NIS - to fix problems with using C2 and yppasswd

                SS2 - hanging with large programs/colour map problems using
                some X programs (leaving you with a blank screen)

                "patch 100228-02 (psig action) which we definitly need, if we
                did rrestore from a just upgraded 4/490 (tapehost) to another
                upgraded 3/260 (diskhost) (or sometimes even a simple rsh)
                it would crash with psig action."

                "we've discovered that under 4.1.1, the stock lockd
                prevents Valid Logic's GED from running. Sun and Valid blame
                each other for the problem (sigh)."

        386i:
                 "the BIG lockd patch" ( a YP problem )
                 mclput bug

        Miscellaneous
                mclput bug on a YP master

                file locking

                "more than 95% full disk partitions seems to cause problems."

                 (a 4/330 w/GX needed patches to get the GX to work right --
                 so I am told).

                NeWSprint and SPARCprinter

                FORTRAN 1.3.1 (problems with "extremely large, complicated,
                (non-standard?) Fortran programs"

Several interesting quotes:

        "The only problems we have are usually attached to the ends of our
        users arms."

        "Patches often have negative side-effects."

        "You tend to hear about problems, not successes."

        "...depends ...on the type of work, with file locking being the
        only real basket case."

        "One reason for not bothering with Sun's patches, is that they
        are such a <explicative deleted> mess to install on a big system!"

        "Sun recommends that you install a patch only if you're seeing the
        problem it's intended to fix - i.e., these are cures, not prevention.
        A good reason is that the patches do not go through normal Q/A, with
        the result that patch A will sometimes interfere with patch B in a way
        that can circumvent one of the patches or even make the system
        inoperable."

        "There is a scsi patch for st.o that you will want, eventually with
        your exabytes. Your server can panic in a divide by zero trap."

        "On another note, you should probably patch your mathematica init.m
        to start up X11 before NeWS. The NeWS graphics don't work properly on
        local SS2/GX usage."

        When you talk about occasional crashes, I get curious. IMHO, crashes
        should never happen on servers, except for compute servers. Crashes
        on file-servers, servers for diskless clients etc. is a
        *loss-of-service*, to be avoided at all cost. (If you are down for
        more than 20-30 minutes, you have a great probability for getting
        'stale NFS handles', requiring more reboots etc. It's less of a
        problem when using automount. But 30 minutes down, is no work done.)

        "Our problems have been mostly pilot-error/power/hardware-based."

        "we got a jumbo patch tape and installed it. Things got so bad after
        the patches were installed that we backed out of almost every one."

        "We have only installed one patch in two years and it was to resolve
        a problem we were having with a lightly used 3/280 running an alpha
        version of Lagoto's Networker. Currently we are at 4.1.1 (a and b)
        for all Sun equipment and don't have any patches installed."

        "o Patches don't get the kind of testing scrutiny and
          regression testing that the major releases do.

         o It's a big world out there; to test a SunOS patch across a
          truly representative selection of Sun machines would require
          as much hardware as a large University has. Sun, with its
          (assumed) internal bias towards the newest machines expertly
          configured, tuned, and maintained, is not representative.

         o Patches are usually created by one person, to fix a certain
          specific problem on a certain specific type of machine while
          under a deadline. More often than not they are a hack on
          top of a patch on top of a mistake, rather than a clean
          solution prompted by a rethinking of the flawed logic.

         o Any given installation has a "personality" all its own; as
          much as we as programmers dislike the idea of the "same"
          software acting differently in different locations, this is
          exactly what happens. It manifests itself in differing code
          paths taken because of differing response times, workloads,
          *cable lengths* and so on. More than once I've had software
          (typically PC networking software) that won't work on one of
          our two subnets, but will work on the other. Eventually it
          seemed to come down to such nebulous things as the machines
          on one subnet answered network queries faster, or had
          differing usages of arp tables that made the PC fall off
          tables in one subnet and not another, or extreme ethernet
          cable lengths made some ethercards intermittant on one
          subnet but not another; all the while none of these problems
          manifested on any Sun."

#############################################################################
CONCLUSIONS:

I was pleased, but not too surprised, to learn that most administrators do not
spend a great deal of time to locate and install patches to SunOS. Most
respondents do not install patches except security fixes.

Also pleased to find a relatively small number of consistent problems that
should be watched for.

Systems used in a manner that Sun has designed them for seem to work very
well. Developers and edge-of-technology users should be prepared for
problems.

No pending disasters were implied for those using an unpatched SunOS-4.1.1.
Older versions of SunOS may have problems but depending on usage, may be OK.

The impact of a "crash" differs for various sites. Evaluate your need
for crash-free environment and act accordingly.

Don't install every patch that comes along. Be prepared to remove the patch
if it doesn't fix your problem or causes others. Be cautious about mixing patches. Make sure the patch is intended for your particular H/W and OS
mixture.

Verify CERT security patches by contacting CERT to see if they actually
issued the notification.

It was interesting to see some unfamiliar names in this list. Apparently
there is a large community of silent and/or satisfied customers.

Please, I am not a clearing house for these patches. Questions regarding
patches implied by this summary should be researched through the normal
channels.

Several people asked me about the patch to the 'nbuf' variable. If I did
not answer you directly, please ask me again.

#############################################################################

        Daryl Crandall
        The Mitre Corporation
        daryl@mitre.org
        (703) 883-7278



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:14 CDT