SUMMARY--Poor SMD disk performance

From: John Valdes (valdes@geosun.uchicago.edu)
Date: Thu Aug 08 1991 - 19:47:08 CDT


Hello all,

First of all, thanks to those who responded to my query:

 "Ric Anderson" <ric@cs.arizona.edu>
  yih%atom@cs.utah.edu (Benny Yih)
  dan@breeze.bellcore.com (Daniel Strick)
  stern@sunne.East.Sun.COM (Hal Stern - Consultant)
  david@elroy.Jpl.Nasa.Gov (David Robinson)
  markw@utig.ig.utexas.edu (Mark Wiederspahn)
  wallen@cogsci.UCSD.EDU (Mark R. Wallen)
  webber@world.std.com (Robert D Webber)
  matt@wbst845e.xerox.com (Matt Goheen)

Once again sun-managers proves to be educational.

Earlier I reported poor SMD disk access performance from our Sun 3/160 running
4.1 w/ an SMD-4 (Xylogics 7053) controller. Specifically, I was reading a
32MB file in 512 byte chunks (i.e., read(fd,buffer,512)) and found execution
times to be slower than desired (typically 2.0u 52.2s 1:52, My original
request is given below). The problem turned out NOT to be due to improper
system configuration, as I thought, but rather with my program and system
memory limitations.

First, an explanation (naive and oversimplified, I'm sure) of how SunOS 4.X
handles file I/O. When reading a file, the system will try to cache as much
of it as possible into physical memory. It does this by mapping pages of the
file into the process's address space. Then, when the process tries to read
part of the file which hasn't yet been loaded into memory, a page fault occurs
and the system will page in the requested page.

This is well illustrated by the output from `vmstat 5` on a Sun (4/330) with
72MB of physical memory while reading ~32MB (32,768,000 bytes to be exact)
from a file:

 procs memory page disk faults cpu
 r b w avm fre re at pi po fr de sr i0 i2 i4 i6 in sy cs us sy id
 0 0 0 0 55392 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0100
 0 0 0 0 55392 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0100
[program started]
 0 1 0 0 54904 0 2 288 0 0 0 0 0 0 0 521160 49 75 0 22 77
 0 1 0 0 52976 0 0 448 0 0 0 0 0 0 0 681896 67 120 0 31 69
 0 1 0 0 50544 0 0 496 0 0 0 0 0 0 0 662037 68 127 0 30 70
 0 1 0 0 47960 0 0 504 0 0 0 0 0 0 0 662072 69 130 0 26 74
 1 0 0 0 45328 0 0 504 0 0 0 0 0 0 0 662106 68 130 0 26 74
 1 0 0 0 42720 0 0 504 0 0 0 0 0 0 0 642126 66 129 0 29 71
 1 0 0 0 40216 0 3 472 0 0 0 0 0 0 0 612084 64 121 0 29 71
 0 1 0 0 37752 0 0 472 0 0 0 0 0 0 0 622205 64 122 0 26 74
 0 1 0 0 35272 0 0 472 0 0 0 0 0 0 0 622264 64 123 0 30 70
[notice all the pages read in, and watch free mem get eaten]
 1 0 0 0 32792 0 0 472 0 0 0 0 0 0 0 612275 63 121 0 33 67
 0 1 0 0 30352 0 0 472 0 0 0 0 0 0 0 602305 62 118 0 29 71
 1 0 0 0 27976 0 0 456 0 0 0 0 0 0 0 582349 61 116 1 28 71
 0 1 0 0 25624 0 4 440 0 0 32 0 0 1 0 542495 66 140 0 27 72
[program done: 0.1u 9.1s 1:03 14% 0+240k 4004+0io 4000pf+0w
 notice (pi ~ 4000pages) * 8192bytes/page = 32768000 bytes paged in]
 0 0 0 0 24096 0 0 136 0 0 0 0 0 0 0 0 873 28 49 0 0100
 0 0 0 0 23608 0 0 32 0 0 0 0 0 0 0 0 285 12 16 0 0100
 0 0 0 0 23448 0 0 0 0 0 0 0 0 0 0 0 93 7 4 0 0100
[program re-executed: note free mem--the file is still cached in memory]
 1 0 0 0 23360 0 0 0 0 0 0 0 0 0 0 0 31 261 7 0 63 36
 1 0 0 0 23304 0 0 0 0 0 0 0 0 0 0 0 12 424 9 1 84 15
[program done! 0.0u 7.3s 0:07 100% 0+240k 0+0io 0pf+0w --almost 9x faster!]
 0 0 0 0 23352 0 4 0 0 0 0 0 0 0 0 1 13 141 3 0 3 97
 0 0 0 0 23352 0 0 0 0 0 0 0 0 0 0 0 3 48 0 0 0100
 0 0 0 0 23352 0 0 0 0 0 0 0 0 0 0 0 1 19 0 0 0100

The poor "access" performance I was experiencing was due to a combination of
the small read buffer I was using (512 bytes) and the paging overhead from
reading in the 32MB file when there were only 6MB of physical memory free.
The optimal read buffer size seems to the page size, 8192 bytes (or integer
multiples). Apparently, read()ing less than a page incurs more system
overhead. (I'm not exactly sure why, considering that the page is already
cached in memory. Perhaps the system is checking page boundaries or
something??) In any case, regardless of whether I understand what's going on,
the system time is more than halved when using an 8K buffer (0.1u 22.9s 0:47
48% 0+56k 4001+0io 4015pf+0w). This is much more acceptable (considering the
hardware involved).

Even more speed can be squeezed out by using the mmap() and madvise() system
calls to help minimize the paging overhead. BTW, SunOS 4.1.1 has been tuned
to more optimally deal with sequential file access like this.

Thanks again for all the helpful advice!

John Valdes
valdes@geosun.uchicago.edu

-------------------------ORIGINAL REQUEST BELOW------------------------------

Date: Tue, 6 Aug 91 14:19:33 CDT
From: John Valdes <valdes@geosun.uchicago.edu>
To: sun-managers@eecs.nwu.edu
Subject: Poor SMD disk performance

Hello all,

  We are experiencing pitiful SMD disk access times on our Sun 3/160. We have
a Sun SMD-4 (Xy7053) controlling 3 disks: two Fujitsu M2351 Eagles and one
Fujitsu M2382K. Disk performance is slow on all three.

  I created a 32MB file on each disk and ran the following program through the
csh time command:

/*
 * program to test disk access performance
 */

#include <stdio.h>
#include <fcntl.h>

main()
{
  int i, fd ;
  char filebuf[256], buffer[512] ;

  printf("Enter filename: ") ;
  scanf("%s",filebuf) ;
  printf("Reading %s\n",filebuf) ;

  fd = open(filebuf,O_RDONLY) ;
  if ( fd == -1 )
    {
      printf("Unable to open %s\n",filebuf) ;
      exit(1) ;
    }

  for ( i=0 ; i<64000 ; i++ )
    {
      read(fd,buffer,512) ;
    }

  close(fd) ;

}

Output times, regardless of which disk is accessed, are typically

  1.9u 52.9s 2:01 45% 0+56k 4000+0io 4021pf+0w

Assuming that all the system time is taken by the read() call, this gives a
woeful 0.60 MB/s read rate, which seems to be independent of the number of
bytes read per read call (multiples of 512 bytes, that is--I'm assuming
reading 512 bytes bypasses kernal buffering(?) ). Note the large amount of
paging, and the number of page faults. (Should these be telling me something?)
No other processes are active while this test runs, and there's at least 6MB
of physical memory (18MB virtual) available. Typical `vmstat 5` gives:

 procs memory page disk faults cpu
 r b w avm fre re at pi po fr de sr x0 x1 x2 d3 in sy cs us sy id
 0 0 0 0 264 0 0 0 0 0 0 0 0 0 0 0 10 86 15 5 3 91
 0 0 0 0 264 0 0 0 0 0 8 0 2 0 0 0 2 30 4 5 4 91
 0 0 0 0 256 0 1 0 0 8 0 1 0 0 0 0 0 12 1 5 0 95
[process starts]
 1 0 0 0 208 0 0 264 0 248 0 28 1 0 50 0 34 559 13 8 70 22
 1 0 0 0 192 0 0 432 0 416 0 46 1 0 65 0 55 884 20 8 87 5
 1 0 0 0 200 0 0 480 8 472 0 74 5 0 61 0 62 980 25 7 83 10
 1 0 0 0 216 0 0 440 48 416 0 113 8 0 56 0 62 915 30 10 76 14
 1 0 0 0 200 0 0 456 48 448 8 104 8 0 54 0 63 943 31 7 79 13
 1 0 0 0 208 0 0 400 88 408 0 74 14 0 48 0 61 833 35 8 67 25
 1 0 0 0 208 0 0 424 80 424 0 85 8 0 56 0 63 871 37 10 75 15
 1 0 0 0 192 0 0 488 16 472 0 70 0 0 66 0 651009 27 8 87 5
 1 0 0 0 192 0 0 504 0 488 0 66 3 0 62 0 651019 29 6 86 8
 1 0 0 0 192 0 0 520 0 512 0 70 0 0 66 0 661064 24 9 87 4
 1 0 0 0 184 0 0 512 0 512 0 78 1 0 65 0 651057 26 9 87 4
 1 0 0 0 176 0 0 520 0 528 0 93 0 0 66 0 661074 24 10 88 1
 1 0 0 0 184 0 0 520 0 536 0 78 6 0 60 0 651023 34 8 82 10
 0 1 1 0 184 0 0 320 0 320 0 44 9 0 13 0 39 539 28 6 21 73
[process ends]
 0 0 0 0 224 0 0 72 0 72 0 11 1 0 0 0 10 144 8 7 0 93
 0 0 0 0 224 0 0 16 0 16 0 3 0 0 0 0 2 51 3 8 0 91
 0 0 0 0 224 0 4 0 0 48 0 7 1 0 0 0 1 20 2 5 5 90
 0 0 0 0 208 0 0 0 0 0 0 1 0 0 0 0 0 9 0 5 0 95

  A similar test on the SCSI disks of a 4/330 (SunOS 4.1) gives times typical
of:

  0.5u 15.4s 0:23 67% 0+232k 0+0io 0pf+0w

which is about 2 MB/s--much more reasonable (no paging or faults here).

  Does anyone have any ideas on how to speed things up? Could this just be a
noisy cable, or is something improperly configured (or is my read() call
incorrect)? The docs for the controller and disks quote transfer rates of 2.4
MB/s and 1.8 MB/s, respectively (3.0 MB/s for the M2382K, if the controller
were that fast). I realize that these times are ideal, but is 0.6 MB/s the
best we can hope for?

  The machine in question is a 3/160 running an essentially generic SunOS 4.1
kernal (unneeded devices and options deleted). The SMD-4 controller was a
replacement upgrade of a Xylogics 450, but all three disks have been
reformatted using the new controller under 4.1. None of the disks are near
capacity, and none show any significant fragmentation.

Please respond to me and I'll summarize to the list. My apologies for the
length of this request.

Thanks!

John Valdes
valdes@geosun.uchicago.edu



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT