Summary: Oracle/kernel tuning questions

From: Ju-Lien Lim (julienlim@rocketmail.com)
Date: Sun Nov 02 1997 - 22:58:15 CST


My original question:
I've got a server with 1 Gig of RAM and shhmax is
set up for 16 MB, can anyone give me some
suggestions as to which parameters I should increase
 for performance? Also, can anyone point me to a
Searchable Archive for Oracle where I might find
such information?

My thanks to the following people for their valuable
insight and suggestions:
 
    Joel Lee <jlee@thomas.com>
    Bill Walker <bwalker@sfa.org.uk>
    Viet Hoang <vhoang@lucent.com>
    Glenn Satchell <glenn@uniq.com.au>
 
Please see attached for the summary. Thanks again.

         Ju
         julienlim@rocketmail.com
 
> My original question:
> I've got a server with 1 Gig of RAM and shhmax
is set
> up for 16 MB, can anyone give me some
suggestions as
> to which parameters I should increase for
> performance? Also, can anyone point me to a
> Searchable Archive for Oracle where I might find
such information?
>
> My thanks to the following people for their
valuable insight and
> suggestions:
>
> Joel Lee <jlee@thomas.com>
> Bill Walker <bwalker@sfa.org.uk>
> Viet Hoang <vhoang@lucent.com>
> Glenn Satchell <glenn@uniq.com.au>
>
> Ju
> julienlim@rocketmail.com
>
> ---
>
> There's an FAQ posted to comp.databases.oracle
newsgroup every month or
> so. It is also available via anon ftp from
rtfm.mit.edu, the home of
> all FAQs.
>
> ---
>
> That seems very low, we set ours to 128MB so that
the DBA can make their SGA's
> bigger to buffer more data. It all depends on the
type of db you're running, of course.
>
> ---
>
> You can set SHHMAX to anything up to 2GB, it does
not have any adverse
> effect on performance. Generally rule of thumb is
that it should be greater than
> your SGA and sensibly about 75% of your physically
RAM.
>
>
> We have Oracle 7.3.3 on 2.5.1 on 8-way SPARC1000E
756mb RAM. Here's our
> /etc/system entries ;
>
> *** Set Shared Memory / Semaphores for Oracle
> set semsys:seminfo_semmni=200
> set semsys:seminfo_semmns=200
> set semsys:seminfo_semmsl=120
>
> set shmsys:shminfo_shmmax=499108864
> set shmsys:shminfo_shmmin=1
> set shmsys:shminfo_shmmni=512
> set shmsys:shminfo_shmseg=10
>
> forceload: sys/msgsys
> forceload: sys/shmsys
> forceload: sys/semsys
>
> HOWEVER, see attached file for what the experts say.
>
> --------------------------------- Cut Here
---------------------------------
>
> Optimizing and Measuring the Solaris Kernel For
Large Oracle Servers.
> by Mike Jaffee, Sun Microsystems
>
> The first part of the paper will discuss the basics
of Solaris Internals that
> are relevant to the Oracle DBA along with tips to
common technical questions
> and relevant header files. The second part is
quoted tuning information taken
> from Sun Experts. The final part is a discussion of
kernel memory allocation,
> how to measure it, and some things that can be done
to prevent starvation.
>
> Solaris Internals
> Sparc has two rings of execution. The inner ring is
for kernel functions and
> the outer ring is for user process functions. The
process address space is
> virtual, and normally only part of a process is in
physical memory. The kernel
> stores the contents of the process address space in
physical memory, on-disk
> files, and specially reserved swap areas. Over time
the kernel shuffles pages
> of the processes between physical memory and disk.
Each process has registers
> that are stored in the kernel and are place in the
hardware registers at run
> time. A process must block if it is waiting for a
resource and allow another
> process to run. The kernel allows each process a
brief period of time, usually
> 10 milliseconds, to run before performing a context
switch. (Vahalia p.20-25)
> On startup once the kernel is loaded, user
processes can request system
> services from the kernel through the system call
interface. If the process
> misbehaves by dividing by zero or overflow its
stack, a hardware exception
> occurs, and the kernel intervenes, usually
aborting the process. Interrupts
> come from peripheral devices usually indicating a
status change or I/O
> completion. Two important processes that manage
memory are the swapper and
> pagedaemon. (Vahalia p.22-25)
>
> Each process has a virtual memory address space
(VMA) that is translated to
> physical memory addresses by page tables. This
mapping is done by the chip's
> MMU. (Tip - System panics can be either hardware or
software related. The MMU
> registers give helpful hints on what actually
caused the panic.) In addition to
> kernel and user mode, there is kernel and user
space. This refers to regions
> in virtual memory address space of the process.
There is only one kernel and
> many processes and hence every process must map in
a single kernel address
> space. The kernel portion of the VMA maintains
global data structures and some
> per process objects. These can only be accessed by
the kernel when the chip is
> running in kernel mode (ring 0). Since the kernel
is shared by all processes,
> kernel space must be protected by user-mode access.
This is done by requiring
> the processes to use the system call interface.
This requires the chip to go
> into kernel mode, transfer program control to the
kernel, have the kernel
> execute system code instructions, then switch back
to user mode and user
> control of the process. (Vahalia p.22-23)
>
> System Services
> Oracle uses many Solaris system services such as
file and record locking,
> inter process communications, virtual memory, and
process scheduling. Common
> system calls are open, read, write, fcntl, kill,
priocntl, plock, memcntl,
> sync. Common Signals are SIGSEGV - usually means
user stack overflow, SIGBUS
> - out of the process address space, SIGTERM - user
has "hung up" without
> exiting gracefully, SIGUSR1 - defined signal for
asynchronous events, SIGKILL
> - kill process immediately no exceptions. Oracle
uses file and record locking
> by setting read write locks on portions of a file.
Any process can read a
> file that is locked but only the owner of the lock
can update the file. A
> write lock is sometimes called an exclusive lock
and a read lock is sometimes
> called a shared lock. Process scheduling is usually
managed very well by the
> kernel, however a slow job can be speeded up by the
priocntl system call.
> (System Services Guide p.1-25) Jim Skeen of Sunsoft
- "Oracle gets locked-
> down memory as a consequence of using intimate
shared memory (ISM), not
> through plock. It controls sharing inside shared
memory through latches, not
> memcntl or plock." He also cautions against
changing the priority of the
> Oracle processes "This is something we in DBE
actually strongly discourage.
> Only the most daring and knowledgable DBA's should
attempt this. The problem
> is that system threads can get starved if Oracle
processes are not "well
> behaved" when running in real time class. Oracle
processes may easily hog a
> cpu for extended periods of time (time being
measured in Unix quantums). We
> in DBE have experimented with changing the dispatch
table in useful/clever
> ways, to minimize the number of involuntary context
switches. But Oracle
> processes still run in TS class." (private letter
Skeen)
>
> Oracle Internals and Solaris System Services
> Mark Johnson of Oracle and Jim Skeen provide the
following expert insight and
> information. The system global area is defined as
"One or more shared
> segments visible to all Oracle processes that are
used to store precompiled
> SQL and PL/SQL (library cache), database buffers
(buffer cache), and for
> interprocess communication" (Johnson). As far as
process control - "Oracle
> does use semaphores, but latches are the usual
synchronizing mechanism, as
> mutexes implemented as spin locks" (Johnson). On
the subject of locks "Oracle
> maintains database transaction integrity through
use of database locks of
> various sorts--shared read, exclusive read,
exclusive write, etc. These are
> implemented through database locks, not using Unix
file locks. Thus, the
> scope of a database lock can be limited to a single
row in the database. Or,
> the database may choose to lock a database page
(which may be quite a bit
> smaller than a Unix page). Or, the database may
choose to lock an entire
> database table (which may be composed of multiple
database files, which in
> turn may or may not map into Unix files)." (private
letter Skeen).
>
> Oracle uses heavyweight processes that are in the
shared memory portion of the
> process address space. The DBWR (data buffer
writer) process uses aio threads
> known as light weight processes (LWP). An LWP is a
kernel-supported user
> thread that is based on kernel threads. They are
independently scheduled and
> share the address space of the process. Vahalia's
book has a nice discussion
> on LWPs. (Jaffee) Kernel Asynchronous I/O and
Intimate Shared Memory are two
> key technologies used by Oracle on the Solaris
platform.
>
> Asynchronous I/O is needed because a single
blocking thread in a multi-
> threaded application causes all threads to wait
until the thread wakes up.
> What needs to happen is for the thread to issue an
asynchronous I/O request
> and then pass control to another thread in the
process. Also heavy I/O is not
> efficient when done synchronously because of the
large number of context
> switches that must occur every time a thread is
blocked. (Hyuck Yoo)
>
> Asynchronous I/O under Solaris is implemented two
ways - under Solaris 2.3 it
> is using the library and under Solaris 2.4 and
beyond it is in the file
> system layer of the kernel. The library approach
uses kernel-level threads
> where each I/O request is handled by a newly
created kernel-level thread that
> acts synchronously (i.e. issuing read and write
calls). The library lives
> outside of the kernel and the kernel threads that
perform the I/O are
> separate from the calling process. The kernel
approach is much more
> sophisticated and efficient. The basic concept is
to not maintain the queue
> in user space but to put the request directly into
the device driver queue.
> The biowait function is bypassed (which is the
device driver equivalent to a
> blocking function) and the thread transfers control
rather than sleep in the
> kernel. The kernel has buffers with slots called
AIO that maintain a listing
> of all I/O requests. (Hyuck Yoo)
>
> Solaris has provided the ISM feature since 2.2.
The main feature of ISM is
> in addition to sharing the "memory" pages (like the
normal shared memory), it
> also shares the page table entries for those pages
(therefore, it's
> "intimate"). Another side feature, which is more
important for this
> discussion, is that ISM also locks down the shared
memory segment in real
> physical RAM. Since the main purpose of ISM is for
the DBMS products' buffer
> cache usage, this makes sense. (Jaffee)
>
> Sharing page table entries solves the problem of
page table stealing which is
> expensive because all the pages mapped in the
stolen page table have to be
> flushed before being given to another process. This
avoids the condition
> where the whole system may thrash as processes
steal page tables from each
> other. (H. Yoo)
>
> The design team created a new segment in the
process address space called
> segshm so that they could create one set of page
tables for a shared memory
> segment and share the page tables among the
processes that attach that same
> shared memory. In addition to saving page table
allocation, sharing page
> tables have other advantages such as having a
higher cache hit rate on memory
> map lookups because the tables are in a buffer
cache rather than in memory.
> It also avoids the amount of overhead done by the
hardware address
> translation layer since it no longer needs go
through page tables for every
> process to monitor whether a page has been
modified. These are both huge
> savings and speed up the virtual memory paging
algorithm within Solaris. (H.
> Yoo)
>
> IPC
> The Oracle RDBMS is a complex program that uses
multiple cooperating processes
> that must communicate with each other and share
resources. The kernel provides
> a mechanism in user space called inter process
communication or IPC. The
> processes operate in a shared memory segment such
that if one process modifies
> data it will be immediately visible to the other
processes. Data transfer and
> event notifications occur between the various
Oracle processes in the Oracle
> SGA. Semaphores are used for Oracle's own locking
and synchronization
> scheme. Asynchronous events such as errors are
reported to the processes
> using signals. The default action for most signals
from the kernel is to
> terminate the process, however the process may
specify an alternate response
> by providing a signal handler function. (Tip -
Before installing the kernel
> jumbo patch read the readme file to see if there
are any known signal
> problems with Oracle). (Vahalia - p150) The
relevant IPC system calls
>
> Oracle makes are shmget, semget, shmat, shmdt,
shmctl, and semctl. The ipc
> information is stored in the kernel with the
ipc_perm structure. shmget(key,
> size,flag) creates a portion of shared memory
(which will be the size of the
> Oracle SGA) and shmat(shmid, shmaddr, shmflag)
attaches the region to a
> virtual memory address of the process. (shmsys is
how Oracle sets up the
> intimate shared memory segment). The structure of a
shared memory segment
> includes access permission, segment size, the PID
of the process performing
> last operation, and the memory map segment
descriptor pointer as well as
> other fields. (tip - sgabeg in the ksms.s file is
a virtual address not
> physical address (0-0xffffffff = 2 GB). Choose
small beginning addresses for
> large SGAs. Also watch out for 28 bit Sparc chips.
They have a smaller
> virtual addresses. Hal Stern notes "They're really
not 28 bit chips, but
> instead the system architecture only passes 28 bits
of virtual address space
> on to the memory bus. [private letter]) Once
attached the region may be
> accessed like any other memory location without
requiring system calls to
> read or write data to it. Hence shared memory is
the fastest mechanism for
> processes to share data. (Tip - don't be confused
by the SZ field in ps -elf.
> It is in 4 KB pages and represents shared memory in
the case of Oracle. For
> example Oracle may have 60 server processes in a
shared memory segment all
> approximately 25000 4 KB pages. A common
misconception is to think that
> Oracle needs 60 X 4KB X 25000 = 6 GB of virtual
memory. Those 60 processes
> are mainly using the shared memory region in the
process address space).
> (Tip - shared memory pages are backed by swap
space, not by a file. The
> absolute minimum swap must be at least the size of
the SGA.) A process
> detaches the shared memory with shmdt(addr) and
destroys the shared memory
> region completely with the IPC_RMID command of the
shmctl system call. (Tip
> - the important commands are ipcs -b; look at field
SEGSZ for shared memory
> size in use ; sysdef -i and sysdef -i -n /dev/ksyms
for IPC and resource
> table definitions; kill -9 <process id> to
terminate (no core file) a hung
> process or kill -6 <process id> to abort (core
file) a hung Oracle process.
> modload -p sys/shmsys at the command line or
forceload: sys/shmsys in the
> system file maybe needed if ipcs -b doesn't work)
correctly. This is because
> the kernel is dynamic meaning that file systems,
drivers, and modules are
> loaded into memory when they are used, and the
memory is returned if the
> module is no longer needed. (Vahalia - p155-158,
p162-164) Semaphores are
> counters that are used by Oracle to monitor and
control the availability of
> shared memory segments. Typically the process
initializes the semaphore with
> semget, assigns ownership of the semaphore with
semctl , and then updates the
> semaphore with semop. A process has to block until
the semaphore operation
> has reached zero. A semaphore structure contains
the following information -
> semaphore value, the PID of the process that last
performed successfully, the
> number of processes waiting for the semaphore to
increase, and the number of
> processes waiting for the semaphore to reach zero.
(tip-ipc_perm and sem in
> ipc.h, sem.h) (System Services Guide - p68-77).
Shared Memory and Semaphore
> Tunables in Solaris 2 relevant to Oracle. (Tip -
semmnu = semmns = semmsl X
> semmni). There is no harm in setting the numbers
too high since the Oracle
> instance will only allocate semaphores and shared
memory as needed. The
> values are definitions not declarations.
>
>
> Name Default Min Max
Reference
> Suggested
> ____ _______ ___ ___
_________
> ________
> shmmax 1048576 1048576 Available
Maximum shm segment 50%
> of RAM
> RAM size
in bytes
> shmmin 1 1 -
Minimum shm segment 1
> size
in bytes
> shmni 100 100 - Number
of shm id 100
> to
pre-allocate
> shmseg 6 6 -
Maximum number shm 32
> seg
per process
> semmni 10 10 65535 Number
of semaphore 64
>
identifiers
> semmns 60 - - Number
of semaphores 1600
> in
system
> semmnu 30 - - Number
of undo 1250
>
structures in sys
> semmsl 25 - -
Maximum number of 25
> (fixed)
>
semaphores per ID
>
> Solaris Tuning According to the Experts
> Every month in SunWorld Online, the performance
experts at Sun write articles
> on tuning. In addition to the well known book,
"Sun Performance and Tuning",
> Adrian Cockcroft with the help of Rich Pettit have
put together a series of
> scripts called se2.5
(www.sun.com/960301/columns/adrian /se2.5.html. Hal
> Stern, another well known Sun tuning guru, has
written an O'Reilly press book
> on "Managing NFS & NIS" and he too writes articles
that can be downloaded off
> of the web. Fellow SunService Engineers Chris Drake
and Kimberley Woods wrote
> "Panic - System Core dump Analysis" which contains
detailed information on the
> Solaris kernel and common techniques used in to
analysis core files. Brian
> Wong the hardware expert has written a book called
"Configuration and Capacity
> Planning of Large Sun Servers". Most of the tuning
information for large Sun
> Servers running Oracle can be found in these
sources. Since many customers
> often call SunService for further explanations, it
is appropriate to highlight
> some common questions and answer them as the
experts would.
>
> Question 1 - Where is all my Memory?
> Probably the most common performance question of
all is "Why does vmstat report
> only xxxx about of free memory available?" To use
an example, type the
> vmstat 5 and suppose the system shows freemem of
80708 and available swap is
> 330000. Now start the application and observe that
the freemem goes down to
> 8824 and swap goes to 300000. Now stop the
application and observe that all
> of the available swap returns to 330000 but the
freemem returns only to
> 21260. Where then is all of the ram? Does we have a
memory leak? The answer
> is probably no because as Cockcroft notes "(the
app) starts up more quickly
> than it did the first time, and with less disk
activity. The application code
> and its data files are still in memory, even though
they are not active. The
> memory they occupy is not "free." If you restart
the same application it
> finds the pages that are already in memory. The
pages are attached to the
> inode cache entries for the files. If you start a
different application, and
> there is insufficient free memory, the kernel will
scan for pages that have
> not been touched for a long time, and "free" them.
Once you quit the first
> application, the memory it occupies is not being
touched, so it will be freed
> quickly for use by other applications. "(Cockcroft
1) Leaving parts of the
> app in memory even after termination is efficient
because "Attaching to a
> page in memory is around 1,000 times faster than
reading it in from disk."
> (Cockcroft 1) So how can one know if he has a
memory leak in his application?
> The answer is there will be a shortage of swap
space after the program runs
> a while and the SZ field in ps -elf for that app
will grow over time.
>
> Question 2 - My Oracle Server is slow. Can you help
me tune the kernel?
> The answer depends on the version of the operating
system and the level of the
> patches. Early versions of the os had performance
bugs and incompatible
> hardware that were the cause of slow performance.
The latest version of the os
> is self-tuning for high performance and will work
quite successfully on systems
> ranging from a huge SparcCenter 2000 to small
desktops. As Cockcroft says "In
> normal use there is no need to tune the Solaris 2
kernel, since it dynamically
> adapts itself to the given hardware configuration
and application workload. "
> (Cockcroft 2) However for really large Oracle
servers some tuning may be
> needed if using early versions of Solaris 2.3 2.4
and 2.5 without a kernel
> patch that automatically adjusts the the paging
algorithm. Solaris 2.5.1 is
> self tuning for large memory systems. Paul
Faramelli of the kernel TSE group
> has put together the following list of tunables for
Solaris.
>
> Recommendations
> for large Oracle servers (Ram > 1 GB) are listed.
(Tip - Use crash to display
> kernel tunables. As root type crash. At the greater
than prompt, type "od -d
> maxuser" or "od -d lotsfree". The od stands for
octal dump, and the -d stands
> for decimal. By the way every Solaris tunable [even
undocumented ones] can be
> displayed by typing nm /kernel/unix). Note these
recommendations are only
> necessary for early versions of Solaris. The some
recommendations are
> provided by Steve O'Neil of SunService. (Caution -
there is no right answer)
>
> Parameter Description
              
> Recommended
> --------- -----------
              
> -----------
> dump_cnt Size of the dump
                   
>
> autoup Used in struct var for dynamic
configuration of the age
> 300
> that a delayed-write buffer must be, in seconds,
before
> bdflush will write it out (default =
60)
>
> bufhwm Used in struct var for v_bufhwm; it's
the high water mark
> 8000
> for buffer cache memory usage, in
Kbytes (2% of memory).
>
> maxusers Maximum number of users (In 2.3 and 2.4
the default is
>
> number of Megabytes in memory)
                   
>
> max_nprocs Maximum number of processes (10 + 16 *
maxuser)
>
> maxuprc The maximum number of user processes.
(max_nprocs - 5)
>
> rstchown POSIX_CHOWN_RESTRICTED is enabled
(default = 1 )
>
> ngroups_max Maximum number of supplementary groups
per user (def 32).
>
> rlim_fd_cur Maximum number of open file descriptors
per process sysem
>
> wide (default = 64, max = 1024)
                   
>
> ncallout Number of callout buffers (default = 16
+ max_nprocs).
>
> (No longer exists in Solaris 2.2 and
later releases)
>
> nautopush Number of entries in the autopush free
list
> 1024
> sadcnt Number allowed of concurrent opens of
both /dev/sad/user
> 2048
> and /dev/sad/admin (default 16).
                   
>
> npty Number of 4.X psuedo-ttys configured
(default 48)
> 1024
> pt_cnt Number of 5.X psuedo-ttys configured
(default 48)
> 1024
> physmem Sets the number of pages usable in
physical memory. Only
>
> use this for testing, it reduces the
size of memory.
>
> minfree Memory threshold which determines when
to start swapping
> 100
> processes, when free memory falls to
this level swapping
>
> begins (default: 2.4 - 4d = 50 pages,
all others 25
>
> pages, 2.3 - physmem / 64 ).
                   
>
> desfree This is the "desperation" level, this
determines when
> 200
> paging is abandoned for swapping. When
free memory stays
>
> below this level for 30 seconds,
swapping kicks in ( 2.4
>
> 4d = 100 pages, all others 50 pages,
2.3 physmem / 32 ).
>
> lotsfree Memory threshold which determines when
to start paging.
> 512
> When free memory falls below this level
paging begins (2.4
>
> 4d = 256 pages all others 128 pages,
2.3 physmem /16)
>
> fastscan The number of pages scanned per second
when free memory
>
> is zero, the scan rate increases as
free memory falls
>
> from lotsfree to zero, reaching
fastscan ( default: 2.4
>
> physmem / 4 with 64Mb being max, 2.3
physmem / 2 ).
>
> slowscan The number of pages scanned per second
when free memory
>
> is equal to lotsfree, also see fastscan
( defaults: 2.4
>
> is fixed at 100, 2.3 fastscan /10 ).
                   
>
> handspr- Is the distance between the front hand
and backhand in
>
> eadpages the clock algorithm. The larger the
number the longer an
>
> idle page can stay in memory (default:
2.4 physmem / 4
>
> 2.3 physmem / 2 ).
                   
>
> maxpgio The maximum number of page-out I/O
operations per second.
> 120
> This acts as a throttle for the page
deamon to prevent
>
> page thrashing ((DISKRPM * 2) /3 = 40).
This parameter
> must be set higher if using two swap
partitions.
> t_gpgslo 2.1 through 2.3, Used to set the
threshold on when to
>
> swap out processes (default 25 pages ).
                   
>
> ufs_ninode Maximum number of inodes.
(max_nprocs+16+maxusers+64)
> 34906
> ndquot Number of disk quota structures.
(default = (maxusers *
>
> NMOUNT / 4) + max_nprocs)
                   
>
> ncsize Number of dnlc entries. (default =
max_procs + 16 +
> 34906
> maxusers + 64); dnlc is the
directory-name lookup cache
>
>
> Cockcroft on maxusers
> "I never set maxusers. It sizes itself based on the
amount of RAM in the
> system. In some cases on configurations with
gigabytes of RAM it needs to be
> reduced to avoid problems with lack of kernel
address space. The kernel uses up
> a lot of space keeping track of all the RAM in a
system. Several other kernel
> table sizes and limits are derived from maxusers."
(Cockcroft 2)
>
> Cockcroft on ncsize
> "The directory name lookup cache (DNLC) is sized to
a default value based on
> maxusers. A large cache size (ncsize) significantly
helps NFS servers that
> have a lot of clients. On other systems the default
is adequate."(Cockcroft 2)
>
> Question 3: How much swap is needed for a large
Oracle database?
> Many people are under the impression that very
little swap is needed for Oracle
> because the architecture uses temporary tablespaces
for sorting and the SGA is
> fixed in memory. Well the truth is large databases
require a lot of swap. The
> shared memory segment is backed by swap so the
allocated swap MUST be at least
> as large as the shared memory segments. In
addition when the database uses
> intimate shared memory this is also backed by swap.
All of the Oracle
> processes must be partially backed by swap. Steve
Schuettinger, the Oracle
> applications specialist at Sun, recommends at least
2 GB of swap for benchmark
> testing on large servers. Obviously since RAM plus
swap equals virtual memory,
> once swap is gone, the program will halt and no
new apps can be started until
> other programs have stopped. As Adrian Cockcroft
says "The important thing to
> realize about swap space is that it is the combined
total size of every program
> running and dormant on the system that matters.
When a system runs out of swap
> space it can be very difficult to recover.
Sometimes you find that there is
> insufficient swap space left to login as root or
run the commands needed to
> kill the errant process that is consuming all the
swap space." (Cockcroft 3) In
> Theory Solaris 2 changes the rules by adding the
RAM and the disk space so if
> the system has enough RAM for the workload, "it can
run with no swap disk. In
> practice common database applications that are
sized to run in a few gigabytes
> of RAM will actually need many gigabytes of disk
allocated as swap space."
> (Cockcroft 3) In the same article Cockcroft says
"The consequences of running
> out of swap space affect a larger number of users
on a big server, so it wise
> to allocate a lot more than you normally need to
cope with any usage peaks. To
> start with, add twice as much disk as you have
RAM." (Cockcroft 3) (Tip - It is
> not worth making a striped metadevice to swap on -
that would just add overhead
> and slow it down. There is also a limit of 2
gigabytes on the size of each swap
> partition, so striping disks together tends to make
them too big.
> /usr/ucb/ps alx, fields SZ or SIZE,
/usr/proc/bin/pmap
>
> % /usr/ucb/ps alx
> F UID PID PPID CP PRI NI SZ RSS WCHAN S
TT TIME
> COMMAND
> 8 2595 1133 1130 0 48 20 988 360 modlinka S
pts/4 0:00
> -bin/csh
>
> There is confusion between what ps reports. The
"/bin/ps prints a field
>
> labelled SZ, but this is the resident set size in
RAM -- printed as RSS
> by the
> /usr/ucb/ps. You need to use the SZ or SIZE field
reported by
> /usr/ucb/ps alx
> in units of kilobytes to determine the amount of
swap space used by the process." (Cockcroft 3)
>
> Oracle's Mark Johnson adds the following "I had
thought the standard Oracle
> rule of thumb was 2 to 4 times physical memory (can
be a bit less on very
> large memory systems). Smaller memory systems may
want to use higher ratios
> of SGA size to physical memory size and higher swap
space ratios. (I ended
> up using ratios of 1:1 and 1:4 for a very small
Solaris for Intel system with
> surprisingly good results.)"
>
> Hal Stern says "So why do you need swap space if
your SGA << phys mem? The
> short answer is that the "phys mem" in that
calculation is the non-locked-
> down physical memory, and when you allocate an
oracle SGA, you allocate
> intimate shared memory (ISM) that is taken out of
the physical memory pool
> (ie, it gets locked down). so on a 1 Gbyte
machine, you may think you're ok
> with a 256M SGA, leaving 700M+ for processes. BUT:
the 256M SGA gets taken
> out of the available memory pool, so your maximum
VM is only 700M+, and you
> could probably use the swap space....as the
SGA/memory ratio goes up, this is
> even more true." (private letter from Stern)
>
>
> Question 4 - Will a faster cpu help performance?
> The answer is not easy to answer. As Hal Stern
noted " Noticing that you're
> using 20 percent of the CPU doesn't mean anything
until you know the kind of
> work that's using the cycles. If you're CPU-bound,
then you have headroom to
> increase the workload by a factor of four or five.
An I/O-bound job, however,
> that uses 20 percent of the CPU might be improved
by adding disk spindles. As
> you increase the disk count and I/O load, to ease
the bottleneck, you'll use
> more CPU to deal with the I/O setup, system calls,
and interrupts from the
> additional work. You run the risk of morphing a
disk problem into a CPU
>
> shortage. How do you know when relaxing one
constraint pops another one into
> the foreground? Define the right relationships --
CPU time used per disk I/O
> tells you how much system time you eat up as you
add disk load -- and measure
> with your tailored yardstick." (Stern 1)
>
> Preventing Kernel Memory Starvation
> When Oracle is working very hard and the operating
system is Solaris 2.3 or
> early Solaris 2.4, it is possible to have kernel
memory allocation faults
> that can eventually lead to kernel memory
starvation. A new memory allocator
> algorithm has been developed and integrated into
Solaris 2.5.1 (the old
> allocator had paging thresholds that were too low
which causing kernel memory
> allocation failures on very large systems). The
allocator has been back
> ported to rev 40 of the Solaris 2.4 jumbo patch and
to a future rev of the
> 2.5 jumbo patch. No fix has yet been developed for
Solaris 2.3. (Tip - large
> database users should upgrade to Solaris 2.4 or
better). In the past Oracle
> customers could manually adjust paging thresholds.
The actual value that
> needed to be set was proportional and depended upon
the amount of memory and
> the number of cpus on the system. Also in some
cases decreasing maxusers and
> bufhwm would mitigate the problem. The total
allowable size for the kernel on
> the ultrasparc servers running 2.5 is now so large
that kernel memory
> allocation problems on very large systems is
virtually impossible. See
> examples below. The crash output displaying kernel
memory starvation is taken
> from a SparcServer 1000 running Solaris 2.3 with 1
GB of ram and 8 cpus.
>
> Solaris 2.4: Solaris 2.5: Kernel
memory limits
> sun4c 33MB sun4c 33MB
> sun4m 61MB sun4m 100MB
> sun4d 139MB sun4d 251MB
> sun4u 2525MB
> $> kas crash 15
> >map kernelmap FREE: 2042 WANT: 1 SIZE: 2042
SIZE ADDRESS TOTAL
>
> NUMBER OF SEGMENTS 0 TOTAL SIZE 0
> > kmastat
> total bytes total bytes
> size # pools in pools allocated
   # failures
>
-----------------------------------------------------------------
> small 6807 26138880 25677584
    1989915
> big 2652 75276288 73046528
      0
> outsize - - 18571264
   45351
>
> Crash is a very powerful tool that helps analyze
kernel memory allocation
> failures. We see from the output "TOTAL SIZE 0"
indicates that no more free
> kernel memory exists. The FREE field (2042)
indicates that there is still
> plenty of memory in the user portion of the virtual
address space. Carl of
> Sunsoft provides an explanation of kernel map
scarcity under Solaris 2.3 and
> Solaris 2.4. "In the overwhelming majority of
cases on large database
> servers, we have found that 64MB is overly generous
for bufhwm in that
> it can be cut back by one-half (to 32MB) without
too much of an impact on the
> cache hit ratio. What is usually in short supply on
these machines is not the
> buffer cache but the amount of kernel heap (mapped
by kernelmap) that remains
> for non-buffer cache usage. Limiting buffer cache
growth to 32MB frees up an
> addition 32MB to the heap and has proven successful
in avoiding kernelmap
> scarcity at a number of sites running large
database applications. Kernelmap
> scarcity (or equivalently kernel heap scarcity as
the size of the kernel heap
> is limited by the size of the address space the
kernelmap can map) results in
> an extreme slowdown of processing in the systems.
All of a sudden kernelmap
> becomes a scarce resource that every thread
contends for and to exacerbate
> the situation the rate of release is slowed by the
very same contention to
> the point that kernelmap turnover grinds down
almost to the point of
> deadlock. Why 64MB's worth of kernelmap is
inadequate for the largest
> database servers is unknown. The sites on which
this has been a
> problem have been checked for kernelmap leakage and
none has been found. There has
> also been a problem in the past with some kernel
data structures being pre
> allocated from the heap and the size of this pre
allocation being
> inappropriately scaled to physical memory. As it
is fairly common now
> for machines to be equipped with 3GB of physical
memory, this was not the
> right thing to do and did account for some
kernelmap depletion headaches. But
> this particular bug has been fixed. With these two
things discounted, the
> only conclusion is that modern database workloads
are driving up peak
> transient demands for kernelmap to the 100MB
level." (Tip -For large databases
> running Solaris 2.4 or less set bufhwm to 8000 on
4c, 4m, and 4d or upgrade to
> Solaris 2.5 which has a large kernel map address
space.)
>
> Acknowledgements
> I want to thank Sun performance gurus Adrian
Cockcroft and Hal Stern
> for their contributions to this paper. UNIX
architect Mark Johnson of
> Oracle and database expert Jim Skeen of Sunsoft
provided comments on Oracle
> internals. Kernel architect Jeff Bonwick has added
explanations and suggestions
> regarding kernel memory allocation and kernel
memory starvation.
> SunService kernel engineer Paul Faramelli
documented the Solaris tuning parameters
> and SunService Technical Expert Steve O'Neil
provided recommendations for
> tuning large Oracle databases on versions of
Solaris that are not self tuning.
>
> Finally I want to thank Uresh Vahalia who gave me
permission to quote
> at length from his wonderful book "UNIX Internals -
The New Frontiers".
>
> Disclaimer
> The author alone is responsible for the contents of
this paper. No one
> at Sun Microsystems, Sunsoft, SunService, or the
Oracle corporation has
> reviewed or approved the paper for completeness or
accuracy in it's published
> format and nothing in the paper can be construed as
the official policy of Sun
> Microsystems or the Oracle Corporation.
>
> References
> UNIX Internals - The New Frontiers by Uresh
Vahalia, Prentice Hall 1996
>
> "How the Solaris Kernel is Optimized for Oracle" by
Mike Jaffee 1996
>
> "Shared Page Table: Virtual Memory Enhancement for
Data Sharing in
> UNIX" H.Yoo
>
> "Comparative analysis of Asynchronous I/O in
Multithreaded UNIX" Hyuck
> Yoo
>
> "Help! I've lost my memory!" by Adrian Cockcroft,
SunWorldOnline 1995
> (1)
>
> "What are the tunable kernel parameters for Solaris
2?" by Adrian
> Cockcroft (2)
>
> "How does swap space work?" by Adrian Cockcroft,
SunWorldOnline 1995
> (3)
>
> "We suggest creative ways to better your system"
performance by Hal
> Stern
>
> System Service Guide - Solaris 2.4 Manual, SunSoft,
1994
>
> "The Slab Allocator: An Object-Caching Kernel
Memory Allocator" Jeff
> Bonwick
>

_____________________________________________________________________
Sent by RocketMail. Get your free e-mail at http://www.rocketmail.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:07 CDT