SUMMARY: NFS/mountd troubles.

From: Christopher Welsh (cris@deakin.edu.au)
Date: Mon Mar 23 1998 - 17:07:16 CST


 

Hello,

Thanks to:

Bismark Espinoza.
Rick Reineman.
David Evans.
Kris Briscoe.
Joel Lee.

All replies were informative including the tweeks for speeding up server
performance. However the answer was to restart rpc.nisd on my NIS+ root server.

Here are the replies is received.

From: bismark@alta.Jpl.Nasa.Gov (Bismark Espinoza)

Look at /etc/rpc or /etc/inetd for Unitdata entry.

If you have 40 diskless clients, you ethernet traffic
must be very heavy. Find about network utilization first,
either with software commands or a packet sniffer.

From: Rick Reineman <reineman1@raiders.llnl.gov>

Start increasing slowly the number of nfsd daemons until
changes do not seem to improve. You mentioned some ttserverdb errors. I've
never experienced your particular
problem, but I have had problems with the TT databases. Start by removing the
TT_DB from the filesystem mountpoint, Solaris will recreate it. You can do a
man on ttserverdb to checkout the options. There is one to clean up the
TT_DB,
it never works for me.

Definately bump up the number of nfsd's, the Sun docs for NFS tuning give good
rule of thumbs. Right off I'd increase it to 64, you can also increase the
ncsize and ufs_ninode in /etc/system. This will give you a better directory
cache hit rate. Here's what I use:

set ncsize=21872
set ufs_ninode=21872

I'd suggest starting your's around 15,000. You might also checkout the SE
toolkit, from Suns web site.

From: "DJEVANS.AU.ORACLE.COM" <DJEVANS@au.oracle.com>

These can be hard to track down. From my time in ITS there are several
possible causes. Have you tried trussing mountd? If its stopped on
a door() call you may be getting problems with an NSCD lookup error.
Mike Battersby can describe this to you in more detail.
 
The timeouts may be too low if all the machines try accessing
/export/root at once.
 
Also check the bugtraq archives for exploits on statd/lockd. These
may well show up with the errors your showing.
 
I'd try showmount -e on the ELCs and turin to see if some of the
students aren't trying to be funny buggers.
 
However I'd try trussing the process first of all while its failed and
bugtraq as first options.

djve

From: Kris Briscoe <brisco_k@adm-srv.sat.mot.com>

CHRIS,

Definitely tweak the nfsd. The default of 16 is ok under low load with
a minimal number of clients. Unlike SunOS 4.x, the Solaris 2.x nfsd is
multi-threaded and presents low overhead on the server with multiple
concurrent threads configured. Just remember that you cannot go above
1024 yet....32bit limitation...

What version of solaris? One of my servers is running 2.4 and I have
the threads set at 512 for 150 clients...No performance issues...If you
are not sure..I would look at going to 32 and then increasing by 16
until you see the performance you are looking for....

Luck,
Kris
 

From: Joel Lee <jlee@thomas.com>

I think this is Bug ID 4022742. Although I didn;t see any patch that would
fix this.

-- Joel
jlee@thomas.com

My original post.

>For the past few mornings I've been arriving to work finding a long queue of
>lecturers ready to hit me over the head. Seems that mountd on our server is
>failing. ps reveals that the daemon is still running but not responding. This
>server serves two labs full of diskless ELCs with Solaris 2.6, all on switched
>ethernet. Here is the list of installed patches to date:

>105160-01 105357-01 105407-01 105497-01 105552-01 105630-01 105837-01
>105214-01 105361-02 105416-01 105516-01 105558-01 105633-02
>105216-01 105379-01 105426-01 105518-01 105566-01 105665-01
>105222-01 105393-01 105464-01 105524-01 105580-01 105669-01
>105284-03 105397-02 105472-01 105528-01 105618-01 105718-01
>105338-04 105405-01 105492-01 105529-01 105621-01 105746-01

>/var/adm/messages reveals the following messages:

>Mar 18 09:22:42 turin unix: WARNING: nfsauth: RPC: Unitdata error
>Mar 18 09:23:43 turin unix: WARNING: nfsauth: mountd not responding
>Mar 18 09:24:47 turin unix: WARNING: nfsauth: mountd not responding
>Mar 18 09:30:11 turin last message repeated 5 times
>Mar 18 09:40:35 turin /usr/dt/bin/rpc.ttdbserverd[3811]:
>_Tt_db_server_db("/export/root"): 4 (/export/root/TT_DB/file_table)

>What is a Unitdata error?
>Has anyone else had problems with this kind of thing? How have you fixed it?

>For now I'm restarting mountd and then rebooting all 40 workstations small
>groups at a time.

>Oh yea. The server is a SS5 Model 70 with 160Mb ran. Also software raid level
5
>is running. Ethernet is:

>Sun Fast Ethernet (SUNW,501-2739)" 10/100 Mb/sec Ethernet with fast wide scsi.

>I've looked at patch 105615-03 (/usr/lib/nfs/mountd patch) but believe it's
>not applicable.

>This server had been up 56 days without anything going wrong. The only thing
>I'm running on it that is different is a script that calls other hosts lots
>of times via ssh. One more thing. Do you think I should tweek nfsd to more
>than 16 concurrent NFS requests? Any other tweeks for servers that serve
>diskless workstations?

regards,
Christopher Welsh

-- 
-------------------------------------------------------------------------------
Christopher Welsh (cris@deakin.edu.au)	      :	Deakin University
Computer Systems Manager	              :	School of Computing and Maths
http://www.cm.deakin.edu.au/~cris             : Waurn Ponds Campus, Geelong
Disclaimer: I give no guarantees  :) Voice: 61 52 272878, Mobile: 61 0418 
319262
Everybody has enormous potential........ if only you could see what I can see.
-------------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:34 CDT