SUMMARY: SparcStorage Array / Veritas Performance Issues

From: Richard Sands (ras@hubris.cv.com)
Date: Fri Jul 21 1995 - 23:13:19 CDT


Greetings,
        I had a number of interesting replies to my posting about performace problems
with a Sparc Storage Array & the Veritas Volume Manager (copies of that post & the full
replies are given at the end of this post).

The responses covered 5 areas:

1. Interpretation of performance figures
   Michael J. Shon (mshon@sunrock.East.Sun.COM) pointed out that:
   "If you have no overload on the disks, but you show LOTS of iowait,
    what you are really seeing is IDLE time. (!!)
    If ANY processor is in iowait, and any other processors are idle,
    the system counts it ALL as iowait.
    It can be argued that this is correct, although it is far from intuitive."
   
   He went on to explain:
   "...a small amount of *real* wait time, when a given processor was actually
    waiting for the disk, plus a large amount of idle/wait time which is counted
    as wait time because some other CPU was waiting for the disk.
    As soon as that CPU gets its disk operation completed, the idle time
    looks like idle time again.
    ...
    The only way to judge a real IO problem is by looking at the disk info.
    Look at the %wait %busy, service time, and actv (on-disk queue length)
    [ like iostat -x reports ]"
   
   This certainly explains the massive iowait values I was getting, but leaves
   me still not knowing why the system performance is poor.

2. Upgrade the Volume Manager
   Pug (pug@arlut.utexas.edu), Chris Terry (Chris_Terry/Computervision@ausns1.cv.com) &
   Damian Murphy (dmur@bssssq.edu.au) said thay were using the volume manager in a similar
   configurations to mine, with no serious problems. Most suggested I upgrade to 1.9 or
   2.1 though.

3. Check configuration is supported by Sybase
   Al Venz (Al.Venz@seag.fingerhut.com) warned that he'd previously had Sybase refuse
   to support any RAID configurations! Actually Sybase have said they support SSAs under
   Solaris 2.3, & we even had a Sybase consultant in who said this was OK.

4. Upgrade to Solaris 2.4
   Pug (pug@arlut.utexas.edu) pointed out that if we upgraded so Solaris 2.4
   "This will allow for fast-writes which improves things greatly!"
   I'd love to do this as Solaris 2.4 has a number of performace improvements
   for configs like ours. However last I checked Sybase don't support 2.4 for
   our configuration.

5. Wait for Solstice DiskSuite 4
   Chris Terry (Chris_Terry/Computervision@ausns1.cv.com), Kevin Sheehan
   (Kevin.Sheehan@uniq.com.au) & Clive Haworth (hawortc@gb.swissbank.com)
   suggested that DiskSuite will replace Veritas, maybe for Solaris 2.5.
   Although Pug (pug@arlut.utexas.edu) believes that veritas offers
   considerably better functionality.

My conclusions are that we should go ahead & upgrade the volume manager anyway,
& if Sybase OK it upgrade the OS to 2.4 at the same time. However I suspect
that the root cause of the performance problem may be lurking elsewhere.

Many thanks again to all those who replied, & in particular to Michael Schon
who even replied to some follow-up questions I sent him.

The full text of my original posting follows, acompanied by the full replies
received:

-------------------------- ORIGINAL QUESTION ------------------------------

>From ras Tue Jul 18 12:09:13 1995
To: sun-managers@ra.mcs.anl.gov
Subject: Problem: SparcStorage Array / Veritas Performance Issues

Greetings,
        We are using SparcStorage Arrays under the veritas volume manager.
The performance is much worse than I'd expect though. on looking into this
it appears the system spends about 50% of it's CPU time in an iowait
state. The service times on the disks are OK (typically 20ms), & the
data volumes aren't high enough to overload the various busses, so my
guess is that the volume manager is somehow adding a large overhead, &
killing performance.

If this is so then I can upgrade to a more recent version of the volume
manager in the hope that this will improve things. However as this will
possibly mean a full system rebuild on a 24 X 7 system I'd rather avoid
this until I'm sure it will help. Has anyone else had any similar
experiences, or any suggestions on how I can get performance to an
acceptable level?

I've heard muttered rumours about performance issues with the veritas
volume manager, & that sun will drop it soon for an enhanced
Online:DiskSuite. If anyone can shed any light on this I'd be very interested.

The details are:
    Hardware : SS1000 (4CPU 256M RAM)
                      Twin 18GB SparcStorage Array model 100s
    Software : Solaris 2.3 (+recomended patches)
                  Veritas volume manager rev. 1.3
    Application : Sybase 4.9.2 on raw volumes (read-intensive)
    Disk Config : All volumes are mirrored.
                      All database volumes are striped & mirrored (64K stripes)
                      using the volume manager.
                      All mirroring occurs across the two arrays.

-------------------------- REPLIES ------------------------------

From: Chris Terry/Computervision
  <Chris_Terry/Computervision@ausns1.cv.com>
Date: 19 Jul 95 13:59:44
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Richard,

Revision 1.9 of the Volume manager is available for Solaris 2.3. We have a
number of sites with it which are working fine (although one has had a problem
when trying to encapsulate the root/boot disk).

Sun have released Solstice (aka Online) Disksuite 4.0 which is a full GUI
interface with RAID5 support. It is slated to replace the volume manager in the
near future (read Solaris 2.5 i think).

Chris

---------------------------------------------------------------------------

From: Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
Date: Wed, 19 Jul 1995 14:15:01 EST
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Dunno about product plans, but we noticed the high CPU overhead too. My
guess is that with RAID 5 in ODS, there isn't really a reason to use vertias.

The other nice goodie in ODS is the journalling (log) file system. Faster
for writes, and an almost instant fsck...

                l & h,
                kev

---------------------------------------------------------------------------

From: Damian Murphy <dmur@bssssq.edu.au>
To: Richard Sands <ras@hubris.cv.com>
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Richard,

I run a very simliar config but with only one storage array and an oracle
db running on ufs filesystems.

Surprisingly the performance difference between ufs and raw was limited in
tests we conducted after gettig the array. This may have changed as the
system has more than tripled in twelve months.

My mirroring occurs within the same array.

My service times are on avarage below yours around 16-18ms but this is not
much.

I tried to find our optimimun stipe calculations that my R&D dept did for
me at the time but they seem to be lost.

Sorry, this is not much help but a little information can be dangerous.
Interested to hear what info you get.

Wishing you luck..
Damian

---------------------------------------------------------------------------

From: Clive Haworth <hawortc@gb.swissbank.com>
Date: Wed, 19 Jul 1995 09:01:15 +0100

Re: ODS 4.0

This is being released imminently as far as I know.
It has GUI (aka Veritas) but doesn't support dual
hosting / High Availability properly.

I've played around with Veritas for a while now. I
certainly agree with you, performance is appalling,
I had a mirrored setup. This may be the cause ?

Clive Haworth
Open Systems Solutions
c/o hawortc@gb.swissbank.com

---------------------------------------------------------------------------

Date: Wed, 19 Jul 1995 07:14:51 -0500
From: Al.Venz@seag.fingerhut.com (Al Venz)
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Richard,

You may want to check with sybase and verify that they will support this
configuration. The last I knew sybase refused any support for databases that
were mirrored on the hardware side rather than through sybase. I don't know if
system 10 will be addressing this for sure or not, but a company I used to work
for had an almost identical setup and had a hard time dealing with sybase on any
issues. They simply refused to support RAID disks at that time...

Good luck,
Al

P.S. There really isn't much of a difference in version 2.0 and 3.0 of the
volume manager, but I would definitely get one of them rather than stay with 1.3

---------------------------------------------------------------------------

From: mshon@sunrock.East.Sun.COM (Michael J. Shon {*Prof Services} Sun Rochester)
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Ah - there's the missing part of the message that I just responded to.
This paints a different picture.

If you have no overload on the disks, but you show LOTS of iowait,
what you are really seeing is IDLE time. (!!)
If ANY processor is in iowait, and any other processors are idle,
the system counts it ALL as iowait.
It can be argued that this is correct, although it is far from intuitive.

You have a system that isn't busy *enough* !
If you are 50% iowait, then 2 of your 4 cpus have nothing to do.
You're getting all the performace you have asked for, and there's
more available if your apps want it.

This usually means that the apps are not either multi-process or
multi-threaded enough to keep all of the CPUs busy.

There may be features in the database to parallelize the tasks more,
or there may be ways to run more cuncurrent jobs against the
database at the same time by partitioning the jobs among more processes.

---------------------------------------------------------------------------

From: mshon@sunrock.East.Sun.COM (Michael J. Shon {*Prof Services} Sun Rochester)
To: ras@hubris.cv.com
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

[As I understand it] the individual wait times here are as I described;
a small amount of *real* wait time, when a given processor was actually
waiting for the disk, plus a large amount of idle/wait time which is counted
as wait time because some other CPU was waiting for the disk.
As soon as that CPU gets its disk operation completed, the idle time
looks like idle time again.

This looks like a bug, but somewhere, someone thinks that it is a feature.

The only way to judge a real IO problem is by looking at the disk info.
Look at the %wait %busy, service time, and actv (on-disk queue length)
[ like iostat -x reports ]

Of course, I could be completely wrong. :-)
You DID read the disclaimer on my signature, didn't you?

|Also do you know where in-depth info about what kernel parameters portray
|can be found?

It's a well-kept secret.
Mere mortals must never know.

But seriously, I'm not completely sure what you mean.
Do you mean tuning parameters?
THere is some stuff in the AnswerBooks; tables and descriptions. Not great.
If you get a whiff of something, you can often find more in some bug report
in SunSolve.
Your best source of info on tuning is Adrian Cockroft's book, available
at technical bookstores, and through SunSoft Press (probably SunExpress too).

|
|By the way, you were also right that the application doesn't scale well in
|MP environments. It (Sybase 4.9.2) spawns a process for each "engine"
|(processor) configured, but the original process is used about 30 times
|more than any of the spawned processes. It also seems incapable of running
|more that one sybase "thread" on a process at once.

Hmmm.
Sun uses Sybase quite a lot internally, and I think that we probably supply
some people to their development staff
(we get quite cozy with database companies).

It surprises me that it scales so poorly.
We/they should have fixed that long ago.
Perhaps your local Sybase guru has overlooked something, or perhaps there
is a newer version ? [ I am not familiar with Sybase releases ]

Good Luck.

---------------------------------------------------------------------------

From: Tom.Rivera@numen.com (Tom Rivera)
To: ras@hubris.cv.com
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

Richard,

Would be interested in the replies that you receive on this.

Have not heard of too many complaints of performance - although
most of our customers are not running in the configuration that you
have specified. Most are using one SS-Array connected to one SS-1000,
and running Oracle 7.x, and using RAID-5 stripes.

We have attempted to upgrade ouor customers to Veritas VM version 2.1,
and Solaris 2.4 .

I have also heard the same rumors that you have.

---------------------------------------------------------------------------

>From pug@arlut.utexas.edu Thu Jul 20 14:09:22 1995
Date: Thu, 20 Jul 1995 08:08:54 -0500
From: Pug <pug@arlut.utexas.edu>
To: ras@hubris.cv.com
Subject: Re: Problem: SparcStorage Array / Veritas Performance Issues

In article <199507181109.MAA24659@vera.CV.Com> you write:
>I've heard muttered rumours about performance issues with the veritas
>volume manager, & that sun will drop it soon for an enhanced
>Online:DiskSuite. If anyone can shed any light on this I'd be very interested.

We will kill them if they do. Btw, Solstice Disksuite is still not as
fully functional as Volume Manager is, IMHO.

>The details are:
> Hardware : SS1000 (4CPU 256M RAM)
> Twin 18GB SparcStorage Array model 100s
> Software : Solaris 2.3 (+recomended patches)

Upgrade to 2.4 HW 3/95! This will allow for fast-writes which improves
things greatly!

> Veritas volume manager rev. 1.3

Upgrade to 2.1!

Ensure your microcode and f-code is upto date as provided with 2.1 plus
the patches.

Ciao,

============================================================================
Richard Sands E-Mail : ras@hubris.cv.com
Systems Specialist Phone : +44-(0)1494-429537
Computervision IT Fax : +44-(0)1494-440303
High Wycombe, UK US : 617-275-1800 X1165
----------------------------------------------------------------------------
  These views may not reflect my employers, or even my own for that matter
============================================================================



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:30 CDT