SUMMARY: Disk contention

From: DAUBIGNE Sebastien - BOR ( SDaubigne_at_bordeaux-bersol.sema.slb.com ) <SDaubigne_at_bordeaux-bersol.sema.slb.com>
Date: Fri Aug 01 2003 - 06:10:04 EDT
Kevin, Michael, Karl, Mike, Steve : thank you for your answer.

Original question : 
What are the good iostat thresholds to detect disks bottlenecks ?

Answer :

There was a general consensus about the 30 ms service time threshold, but
I'm still convinced that it's not significant for big blocks (say 1 Mb I/O).
A simple sequential read test (dd with 1MB bs) shows that svt_t can reach
+30ms and %b=99% even for a single reading process (You should agree that
one single sequential-read process can't be suspected to generate a disk
bottleneck, right ?)

As stated by Karl :

*	The significant bottleneck threshold is %b (percent time disk busy)
	    > 20% AND (20 ms < svc_t (ServiceTime) < 30 ms)

*	The critical bottleneck threshold is %b (percent time disk busy)
    > 20% AND ( svc_t (ServiceTime) > 30 ms)

As Karl gave many other tuning advice, I have reproduced them at the end of
this message.
---
Sebastien DAUBIGNE 
sdaubigne@bordeaux-bersol.sema.slb.com
<mailto:sdaubigne@bordeaux-bersol.sema.slb.com>  - (+33)5.57.26.56.36
SchlumbergerSema - SGS/DWH/Pessac

	-----Message d'origine-----
	De:	Karl Vogel 
	Objet:	Re: Disk contention

	>> On Tue, 22 Jul 2003 18:49:22 +0200, 
	>> "DAUBIGNE Sebastien - BOR" said:

	S> We have a Solaris 2.6/Oracle box which has poor throughput and a
high
	S> (from 50 to 100) number of IO busy processes (column "b" of
vmstat).
	S> CPU (50%)/memory (no paging) are OK, so I assume the poor
throughput is
	S> due to the disk part.

	   Maybe.  I've included some other things to look at below.

	   First, *strongly* consider upgrading to Solaris-8.  Lots of
throughput
	   improvements, different memory management scheme.

	   We have an Enterprise E450, 1 Gb of memory for our main system.
Tuning
	   took awhile because the information is spread out all over the
planet,
	   but it runs pretty well now.  Our /etc/system is below.

	   Your directory/inode cache (measured by the dnlc script below)
should
	   have a hit rate of at least 90-95%.

	   Add "noatime,logging" to the mount options field in /etc/vfstab
to get
	   the biggest performance and boot time improvement.  You might
have to
	   put in a patch to have logging capabilities under Solaris-6; this
is
	   probably the single biggest improvement you can make.

	S> Also, what is the good interval for iostat samples : 30 sec ?  5
min ?

	   I've read that 30 seconds is as low as you should go, because
kernel
	   counters aren't updated more often.

	-- 
	Karl Vogel                      I don't speak for the USAF or my
company
	vogelke at pobox dot com
http://www.pobox.com/~vogelke <http://www.pobox.com/~vogelke> 

	If a nation expects to be ignorant and free in a state of
civilization,
	it expects what never was and never will be.            --Thomas
Jefferson

	
===========================================================================
	#!/bin/sh
	# dnlc: print directory/inode cache

	PATH=/bin:/usr/bin
	export PATH

	cmd='
	   BEGIN  { fmt = "%-13s %9d %s\n" }
	   /:.../ { s = substr ($0, 30); printf fmt, $1, $2, s }
	   /perc/ { s = substr ($0, 30); printf fmt, " ", $1, s }
	'

	echo 'Directory/inode cache statistics'
	echo '(See /usr/include/sys/dnlc.h for more information)'
	echo

	adb -k /dev/ksyms /dev/mem <<END | expand | awk "$cmd"
	maxphys/D"Max physical request"
	ufs_ninode/D"Inode cache size"
	sq_max_size/D"Streams queue"
	ncsize/D"Directory name cache size"
	ncstats/D"# of cache hits that we used"
	+/D"# of misses"
	+/D"# of enters done"
	+/D"# of enters tried when already cached"
	+/D"# of long names tried to enter"
	+/D"# of long name tried to look up"
	+/D"# of times LRU list was empty"
	+/D"# of purges of cache"
	*ncstats%1000>a
	*(ncstats+4)%1000>b
	*(ncstats+14)%1000>c
	<a+<b+<c>n
	<a*0t100%<n=D"Hit rate percentage"
	END

	exit 0

	
===========================================================================
	#!/bin/sh
	# getkern: show predefined kernel tunables

	kernelvars () {
	adb -k /dev/ksyms /dev/mem << EFF | \
	  awk '/^[a-zA-Z_-]+:/ { \
	        if (!i) { i++; next } \
	        if ($2 >= 0) { printf "%-20s %s\n",$1,$2; }
	        next } \
	       /^[a-z_-]+[ \t0-9a-f]+$/ { next } \
	        { print }'
	autoup/D
	bufhwm/D
	coredefault/D
	desfree/E
	fastscan/E
	lotsfree/E
	max_nprocs/D
	maxpgio/E
	maxphys/D
	maxuprc/D
	maxusers/D
	minfree/E
	nbuf/D
	ncsize/D
	nrnode/D
	physmem/E
	rlim_fd_cur/D
	rlim_fd_max/D
	slowscan/E
	sq_max_size/D
	swapfs_minfree/E
	tune_t_fsflushr/D
	tune_t_gpgslo/D
	ufs_HW/D
	ufs_LW/D
	ufs_ninode/D
	ufs_throttles/D
	EFF
	}

	kernelvars
	exit 0

	
===========================================================================
	* $Id: etc-system,v 1.3 2001/07/26 20:39:55 vogelke Exp $
	* $Source: /space/sitelog/newmis/RCS/etc-system,v $
	*
	* NAME:
	*    /etc/system
	*
	* SYNOPSIS:
	*    Tailors kernel variables at boot time.
	* 
	* DESCRIPTION:
	*    The most frequent changes are limited to the number of file
	*    descriptors, because the socket API uses file descriptors for
	*    handling internet connectivity.  You may want to look at the
hard
	*    limit of filehandles available to you.  Proxies like Squid have
to
	*    count twice to thrice for each request: open request
descriptors
	*    and an open file and/or (depending what squid you are using) an
	*    open forwarding request descriptors.  Similar calculations are
true
	*    for other caches.
	*
	* WARNING:
	*    SUN does not make any guarantees for the correct working
	*    of your system if you use more file descriptors than 4096.
	*    Programs like fvwm (window manager) may have to be recompiled.
	*
	*    If you experience SEGV core dumps from your select(3c) system
call
	*    after increasing your file descriptors above 4096, you have to
	*    recompile the affected programs.  The select(3c) call is known
to
	*    Squid users for its bad temper concerning the maximum number of
	*    file descriptors.

	*
-----------------------------------------------------------------------
	* rlim_fd_cur 
	* Since 8: default 256, no recommendations
	* 
	*   This parameters defines the soft limit of open files you can
	*   have.  Use at your own risk values above 256, especially if you
	*   are running old binaries.  A value of 4096 may look harmless
	*   enough, but may still break old binaries.
	*
	*   Another source mentions that using more than 8192 file
	*   descriptors is discouragable.  It mentions that you ought to use
	*   more processes, if you need more than 4096 file descriptors.
	*   On the other hand, an ISP of my acquaintance is using 16384
	*   descriptors to his satisfaction.
	*
	*   The predicate rlim_fd_cur <= rlim_fd_max must be fulfilled.
	*
	*   Please note that Squid only cares about the hard limit (next
	*   item).  With respect to the standard IO library, you should not
	*   raise the soft limit above 256.  Stdio can only use <= 256 FDs.
	*   You can either use AT&T'ssfio library, or use Solaris 64-bit
mode
	*   applications which fix the stdio weakness.  RPC prior to 2.6 may
	*   break, if more than 1024 FDs are available to it.
	*
	*   Also note that RPC prior to Solaris 2.6 may break, if more than
	*   1024 FDs are available to it.  Also, setting the soft limit to
or
	*   above 1024 implies that your license server queries break (first
	*   hand experience).  Using 256 is really a strong recommendation.
	set rlim_fd_cur = 256

	*
-----------------------------------------------------------------------
	* rlim_fd_max 
	* default 1024, recommended >=4096
	* 
	*   This parameter defines the hard limit of open files you can
have.
	*   For a Squid and most other servers, regardless of TCP or UDP,
the
	*   number of open file descriptors per user process is among the
	*   most important parameter.  The number of file descriptors is one
	*   limit on the number of connections you can have in parallel.
	*
	*   You should consider a value of at least 2 * tcp_conn_req_max
	*   and you should provide at least 2 * rlim_fd_cur.  The predicate
	*   rlim_fd_cur <= rlim_fd_max must be fulfilled.
	*
	*   Use at your own risk values above 1024.  SUN does not make any
	*   warranty for the workability of your system, if you increase
this
	*   above 1024.
	set rlim_fd_max = 1024

	*
-----------------------------------------------------------------------
	* ufs_ninode
	* default 4323 = 17*maxusers+90 (with maxusers 249)
	* 
	*   Specifies the size of an inode table.  The actual value will be
	*   determined by the value of maxusers.  A memory-resident inode is
	*   used whenever an operation is performed on an entity in the file
	*   system (e.g.  files, directories, FIFOs, devices, Unix sockets,
	*   etc.).  The inode read from disk is cached in case it is needed
	*   again.  ufs_ninode is the size that the Unix file system
attempts
	*   to keep the list of idle inodes.  As active inodes become idle,
if
	*   the number of idle inodes increases above the limit of the
cache,
	*   the memory is reclaimed by tossing out idle inodes.
	*
	*   Must be equal to ncsize.
	set maxusers = 2048
	set ufs_ninode = 512000

	*
-----------------------------------------------------------------------
	* ncsize 
	* default 4323 = 17*maxusers+90 (with maxusers 249)
	* 
	*   Specifies the size of the directory name lookup cache (DNLC).
	*   The DNLC caches recently accessed directory names and their
	*   associated vnodes.  Since UFS directory entries are stored in
	*   a linear fashion on the disk, locating a file name requires
	*   searching the complete directory for each entry.  Also, adding
	*   or creating a file needs to ensure the uniqueness of a name for
	*   the directory, also needing to search the complete directory.
	*   Therefore, entire directories are cached in memory.  For
instance,
	*   a large directory name lookup cache size significantly helps NFS
	*   servers that have a lot of clients.  On other systems the
default
	*   is adequate.  The default value is determined by maxusers.
	* 
	*   Every entry in the directory name lookup cache (DNLC) points
	*   to an entry in the inode cache, so both caches should be sized
	*   together.  The inode cache should be at least as big as the DNLC
	*   cache.  For best performance, it should be the same size in the
	*   Solaris 2.4 through Solaris 8 operating environments.
	*
	*   Warning: Do not set ufs_ninode less than ncsize.  The ufs_ninode
	*   parameter limits the number of inactive inodes, rather than the
	*   total number of active and inactive inodes.  With the Solaris
	*   2.5.1.  to Solaris 8 software environments, ufs_ninode is
	*   automatically adjusted to be at least ncsize.  Tune ncsize to
get
	*   the hit rate up and let the system pick the default ufs_ninode.
	*
	*   I have heard from a few people who increase ncsize to 30000 when
	*   using the Squid webcache.  Imagine, a Squid uses 16 toplevel
	*   directories and 256 second level directories.  Thus you'd need
	*   over 4096 entries just for the directories.  It looks as if
	*   webcaches and newsserver which store data in files generated
from
	*   a hash need to increase this value for efficient access.
	*
	*   You can check the performance of your DNLC - its hit rate - with
	*   the help of the vmstat -s command.  Please note that Solaris 7
	*   re-implemented the algorithm, and thus doesn't have the toolong
	*   entry any more:
	* 
	*     $ vmstat -s ...
	*     1743348604 total name lookups (cache hits 95%) 32512 toolong
	* 
	*   Up to Solaris 7, only names less than 30 characters are cached.
	*   Also, names too long to be cached are reported.  A cache miss
	*   means that a disk I/O may be needed to read the directory
(though
	*   it might still be in the kernel buffer cache) when traversing
the
	*   path name components to get to a file.  A hit rate of less than
90
	*   percent requires attention.
	*
	*   For an E450 with maxusers = 2048, ~800,000 files:
	*     default ncsize = 128512 which gives about 90% hit rate.
	*     setting ncsize = 262144 gives about 94% hit rate.
	set ncsize = 512000

	*
-----------------------------------------------------------------------
	* tcp_conn_hash_size
	* default 512
	*
	*   This can be set to help address connection backlog.  During high
	*   connection rates, TCP data structure kernel lookups can be
expensive
	*   and can slow down the server.  Increasing the size of the hash
	*   table improves lookup efficiency.  This is the kernel hash table
	*   size for managing active TCP connections.  A larger value makes
	*   searches far more efficient when there are many open
connections.
	*   On Solaris, this value is a power of two and can be set as small
	*   as 256 (default) or as large as 262144 as is typically used in
	*   benchmarks.  A larger tcp_conn_hash_size requires more memory,
	*   but it is clearly worth the extra investment if many concurrent
	*   connections are expected.  This parameter must be a power of 2,
	*   and can be set in the /etc/system kernel configuration file.
The
	*   current size is shown at the start of the read-only
tcp_conn_hash
	*   display using ndd.
	set tcp:tcp_conn_hash_size = 32768

	*
-----------------------------------------------------------------------
	* noexec_user_stack 
	*   Since 2.6: default 0, recommended: see CERT CA-98.06, or
DE-CERT.
	*   Limited to sun4[mud] platforms! Warning: This option might crash
	*   some of your application software, and endanger your system's
	*   stability!
	*
	*   By default, the Solaris 32 bit application stack memory areas
are
	*   set with permissions to read, write and execute, as specified in
	*   the SPARC and Intel ABI.  Though many hacks prefer to modify the
	*   program counter saved during a subroutine call, a program
snippet
	*   in the stack area can be used to gain root access to a system.
	*
	*   If the variable is set to a non-zero value, the stack defaults
to
	*   read and write, but not executable permissions.  Most programs,
	*   but not all, will function correctly, if the default stack
	*   permissions exclude executable rights.  Attempts to execute code
	*   on the stack will kill the process with a SIGSEGV signal and log
	*   a message in kern:notice.  Program which rely on an executable
	*   stack must use the mprotect(2) function to explicitly mark
	*   executable memory areas.
	*
	*   Refer to the System Administration Guide for more information on
	*   this topic.  Admins which don't want the report about executable
	*   stack can set the noexec_user_stack_log variable explicitly to
	*   0.  Also note that the 64 bit V9 ABI defaults to stacks without
	*   execute permissions.
	* set noexec_user_stack = 1

	*   Log attempted stack exploits.
	* set noexec_user_stack_log = 1

	*
-----------------------------------------------------------------------
	* Swap
	*   System keeps 128 Mbytes (1/8th of memory) for swap.
	*   Reduce that to 32 Mbytes (4096 8K pages).

	set swapfs_minfree=4096

	*
-----------------------------------------------------------------------
	* Network
	*   Set to 100 Mbps.

	set hme:hme_adv_autoneg_cap = 0
	set hme:hme_adv_100T4_cap   = 0
	set hme:hme_adv_100fdx_cap  = 1
	set hme:hme_adv_100hdx_cap  = 1
	set hme:hme_adv_10fdx_cap   = 0
	set hme:hme_adv_10hdx_cap   = 0

	*
-----------------------------------------------------------------------
	* Memory management
	*
	*   http://www.carumba.com/talk/random/tuning-solaris-checkpoint.txt
<http://www.carumba.com/talk/random/tuning-solaris-checkpoint.txt> 
	*   Tuning Solaris for FireWall-1
	*   Rob Thomas robt@cymru.com <mailto:robt@cymru.com> 
	*   14 Aug 2000
	*
	*   On firewalls, it is not at all uncommon to have quite a bit of
	*   physical memory.  However, as the amount of physical memory is
	*   increased, the amount of time the kernel spends managing that
	*   memory also increases.  During periods of high load, this may
	*   decrease throughput.
	*
	*   To decrease the amount of memory fsflush scans during any scan
	*   interval, we must modify the kernel variable autoup.  The
default
	*   is 30.  For firewalls with 128MB of RAM or more, increase this
	*   value.  The end result is less time spent managing buffers,
	*   and more time spent servicing packets.

	set autoup = 120

	*
-----------------------------------------------------------------------
	*   http://www.sunperf.com/perfmontools.html
<http://www.sunperf.com/perfmontools.html> 
	*   
	*   NETSTAT
	*     One key indicator is nocanput being non-zero.
	*   
	*       root# netstat -k hme0
	*       hme0:
	*       ipackets 228637416 ierrors 0 opackets 269844650 oerrors 0
	*       collisions 0 defer 0 framing 0 crc 0 sqe 0 code_violations 0
	*       len_errors 0 ifspeed 100000000 buff 0 oflo 0 uflo 0 missed 0
	*       tx_late_collisions 0 retry_error 0 first_collisions 0
	*       nocarrier 0 nocanput 62 allocbfail 0 runt 0 jabber 0 babble
0
	*       tmd_error 0 tx_late_error 0 
	*       ...
	*   
	*     If this is the case, your streams queue is too small.  It
should
	*     be set to 400 per GB of memory.  Put a similar line in your
	*     /etc/system file.  This assumes you have 4GB RAM.
	*     
	*       set sq_max_size=1600

	set sq_max_size = 400

	*
-----------------------------------------------------------------------
	*   http://www.london-below.net/~adrianc/2002/cookbook.html
<http://www.london-below.net/~adrianc/2002/cookbook.html> 
	*   Recipe bufhwm: Large Active Filesystem (>>TB)
	*   Tell tale sign: small hit rate in the buffer cache
	*   Fix: increase bufhwm
	*   Drawback: may consume memory for little benefit
	*   Created: July 19 2001
	*
	*     Tune the default bufhwm value if you have a small hit ratio on
	*     the buffer cache during periods of high activity:
	*
	*       "sar -b 1 10" shows %rcache or %wcache < 90%
	*
	*     A maximum bufhwm KB of kernel memory is used to cache metadata
	*     information (e.g.  block indirection data).  bufhwm defaults
	*     to 2% of system memory, it cannot be more than 20%.  The
buhfwm
	*     configured on your system can be obtained with
	*       /usr/sbin/sysdef | grep bufhwm
	*
	*     The requirements for bufhwm should be:
	*       'Sum Total of Active Filesystem Size' / 2M.
	*
	*     For a 100GB filesystem then configure 50MB of "bufhwm" kernel
	*     memory and set bufhwm = 50000 (in units of K).  Our current
	*     setting is about 20 MB:
	*
	*       me% /usr/sbin/sysdef | grep bufhwm
	*       20725760   maximum memory allowed in buffer cache (bufhwm)
	*
	*     We're using 86 GB out of about 203 total, so use 50 Mb.
Overall
	*     hits/lookups are around 98% according to netstat -k:
	*       biostats:
	*       buffer_cache_lookups 127637848 buffer_cache_hits 125365885
	*       new_buffer_requests 0 waits_for_buffer_allocs 0
	*       buffers_locked_by_someone 6131 duplicate_buffers_found 53 

	set bufhwm = 50000

	*
-----------------------------------------------------------------------
	*   http://www.london-below.net/~adrianc/2002/cookbook.html
<http://www.london-below.net/~adrianc/2002/cookbook.html> 
	*   Recipe segmap_percent: Dedicated I/O server on large Dataset
	*   Tell tale sign: small segmap cache hit rate
	*   Fix: increase segmap_percent
	*
	*     Only a portion of memory is readily mapped in the kernel in
	*     "segmap" to be the target of an actual I/O.  For a read or
	*     write call, being or not in segmap can cause a performance
	*     difference of approximately 20%.  Solaris 8 introduced a new
	*     kernel parameter called segmap_percent that controls the size
of
	*     segmap.  The segmap is sized to be portion of free memory
after
	*     boot; C17 uses default value of 12%.
	*
	*     On a dedicate I/O server it may be beneficial to increase this
	*     value.  This actually consumes little additionally memory for
	*     segmap structures (< 1%) but it should be noted that the
segmap
	*     portion of the filesystem cache is not considered free memory.
	*
	*   WARNING: setting too high can result in paging storm.
	* set segmap_percent = 20

	*
-----------------------------------------------------------------------
	*   http://www.london-below.net/~adrianc/2002/cookbook.html
<http://www.london-below.net/~adrianc/2002/cookbook.html> 
	*   Recipe ufs_HW: GBs of data written to a file
	*   Tell tale Sign: ufs_throttles keeps increasing
	*   Fix: increase ufs_HW
	*   Created: July 19 2001
	*
	*     UFS keeps track for each file of the number of bytes of data
being
	*     written to disk.  Those are bytes in transit between the page
	*     cache and the disks.  When this amounts exceeds the threshold
	*     ufs_HW then subsequent write(2) will be blocked until enough
of
	*     the I/O operation complete.
	*
	*     We can set ufs_HW/ufs_LW parameters to values that should
limit
	*     the adverse condition:
	*
	*         ufs_HW should be set to many times maxphys
	*         ufs_LW should be 2/3 of ufs_HW
	*
	*     When throttling happens, a process is blocked for a time of
the
	*     order of a physical write, say 0.01s.  This means that a
process
	*     can achieve of the order of ufs_HW/0.01s or 100*ufs_HW
Bytes/s.
	*     The default of 384K throttles a process around 38MB/sec.
	*
	*     Our ufs_HW is the default (384K); doubling it slowed down
	*     throttling but didn't eliminate it.

	set ufs:ufs_HW = 4194304
	set ufs:ufs_LW = 2796202

	*
-----------------------------------------------------------------------
	*   http://www.samag.com/documents/sam0213b/
<http://www.samag.com/documents/sam0213b/> 
	*   Solaris 8 Performance Tuning
	*   maxphys
	*
	*   The maxphys setting, often seen in conjunction with JNI and
Emulex
	*   HBAs, is the upper limit on the largest chunk of data that can
be sent
	*   down the SCSI path for any single request.  There are no real
issues
	*   with increasing the value of this variable to 8 Mb (in
/etc/system,
	*   set maxphys=8388608), as long as your IO subsystem can handle
it.
	*   All current Fibre Channel adapters are capable of supporting
this,
	*   as are most ultra/wide SCSI HBAs, such as those from Sun,
Adaptec,
	*   QLogic, and Tekram.
	*
	*   Try 1Mb for now.

	set maxphys = 1048576

	
===========================================================================
	Notes from a Lotus Domino site running on Solaris

	Disk bottlenecks are the most likely bottlenecks.  Here are the
	thresholds you should look for using the different monitoring tools.

	VMSTAT
	  vmstat is one of the simplest and most useful tools because it
	  reports important data in the categories of CPU, memory
utilization,
	  and disk-I/O.  To see the system activity for 3 seconds with a 1
second
	  reporting interval use:
	  
	    vmstat 1 3

	  In the process (procs) group of statistics, there are two
important
	  stats, r and b:

	    r is the number of processes in the CPU run queue. 
	    b is the number of processes blocked for resources I/O, paging,
	      and so forth.

	  In the memory group of statistics, the important stat is sr:

	    sr is the number of pages scanned and can be an indicator of a
RAM
	       shortage.

	  The cpu group of statistics gives a breakdown of the percentage
usage
	  of CPU time.  On MP systems, this is an average across all
processors.

	    us is the percentage of user CPU time. 
	    sy is the percentage of system CPU time.
	  
	  The following is an example of the results of doing a vmstat 1 3.
	  The r, b, sr, us, and sy columns are most important.

	    procs     memory            page            cpu
	    r b w   swap  free  re  mf pi po fr de sr  us sy id
	    0 0 0 354696 10616   0   7  3  0  0  0  0  65 13 22
	    0 0 0 368976  8104   0   9  0  0  0  0  0   0  1 99
	    0 0 0 368976  8104   0   0  0  0  0  0  0   0  0 100

	  * A significant bottleneck threshold occurs if b (processes
blocked
	    for resources) approaches r (# in run queue)

	  * A critical bottleneck threshold occurs if b (processes blocked
for
	    resources) = or > r (# in run queue)


	IOSTAT
	  You can add the switch -x to provide extended statistics, which
makes
	  the output more readable because each disk has its own line.  You
can
	  also add the -c switch to report the percentage of time the system
	  has spent in user mode, in system mode, waiting for I/O, and
idling.
	  
	  The following is an example of the results of doing iostat -nxtc
30 3
	  The svc_t, %b, us, sy, and wt columns are most important.

	  	    extended device statistics              
	      r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
device
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 fd0
	      3.2    1.0   11.4    3.0  0.0  0.0    0.0    4.6   0   2
c0t0d0
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
c0t1d0
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
c0t2d0
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
c1t6d0
	      0.1    1.5    1.1    5.6  0.0  0.0    0.0    6.9   0   1
c2t0d0
	     22.7    0.2 2045.5    0.7  0.0  0.2    0.0    7.0   0  11
c2t1d0
	      0.0    1.3    0.0    5.4  0.0  0.0    0.0    6.6   0   1
c2t2d0
	      0.0    0.1    0.0    0.4  0.0  0.0    0.0    2.9   0   0
c3t0d0
	      0.0    1.5    0.0    5.6  0.0  0.0    0.0    4.4   0   1
c3t1d0
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
c3t2d0
	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
c3t3d0

	  %b is the percent of time the disk is busy (transactions in
	  progress).

	  The column that I pay most attention to is the wsvc_t column.
This is
	  the average service time in milliseconds.  A high number is a sign
	  that the disk is becoming a bottleneck.  A rule of thumb is >35 is
	  cause for investigation.  Large numbers in the r/s and w/s is an
	  indication of a too small block size.  This could also be a poorly
	  tuned application that is making many small reads/writes instead
of
	  a few large reads/writes.

	  The kr/s and kr/s give you a good indication of how much bandwidth
	  you are using.  For a single Ultra Wide Differential SCSI disk I
	  would expect to get 10MB/s in throughput.  For a correctly
configured
	  stripe, I would expect to get 10MB/s x number of disks in the
stripe.
	  On a read from RAID 5 you should get similar performance.  On
write
	  the cache will help and you should get close to the same
performance
	  while the cache is not being over run.

	  * The significant bottleneck threshold is %b (percent time disk
busy)
	    > 20% AND (20 ms < svc_t (ServiceTime) < 30 ms)

	  * The critical bottleneck threshold is %b (percent time disk busy)
	    > 20% AND ( svc_t (ServiceTime) > 30 ms)
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Fri Aug 1 06:15:34 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:17 EST