lot of connections in Time_wait

Raj rajwin at yahoo.com
Thu Sep 16 05:49:23 EDT 2004


Hi Gurus,
  Thanks for the replies. Following is the summary:

TIME_WAIT conenctions have no effect on server load.
There are just
lots of connections. TIME_WAIT is your friend unless
it prevents more
connections from being created or being serviced.
TIME_WAIT is a state
where connections could conceivably be re-used without
making a new TCP
between servers, so it can help reduce load. 

I'm enclosing a paper I'm writing on the subject of
TCP tuning in
Solaris. Read the section on TIME_WAIT
(tcp_time_wait_interval).

================

tcp_time_wait_interval
"TIME_WAIT is your friend!

This section will attempt to explain the parameter
tcp_time_wait_interval and address the proper way to
determine values
for the parameter.

A TCP connection enters TIME_WAIT as a result of the
connection being
terminated by one or both sides. If a TCP connection
is in TIME_WAIT
state and the remote transmits data, the side having
the TCP connection
in TIME_WAIT sends a RST to indicate that data was
lost.
tcp_time_wait_interval is the length of time a TCP
connection can
remain in the TIME_WAIT state. The parameter is
reported and set in
milliseconds.

One of the most frequently asked questions is, "why is
the default
setting in Solaris 8 so high? The answer is: it
isnt.

from RFC792:

"The graceful close algorithm of TCP requires that the
connection state
remain defined on (at least) one end of the
connection, for a timeout
period of 2xMSL, i.e., 4 minutes" 

"TIME-WAIT - represents waiting for enough time to
pass to be sure
 the remote TCP received the acknowledgment of its
connection
 termination request. 

"TIME-WAIT state removes the hazard of old duplicates
for "fast"
 or "long" connections, in which clock-driven ISN
selection is
 unable to prevent overlap of the old and new sequence
spaces.
 The TIME-WAIT delay allows all old duplicate segments
time
 enough to die in the Internet before the connection
is reopened. 

The default in Solaris 8 is 240000 milliseconds (4
minutes) . This
value was good for 300 BAUD connections, but may be a
bit too long for
todays networks and applications. One side effect of
using the default
value of 4 minutes is that TCP connections can exist
in the TIME_WAIT
state for 4 minutes. It is possible that TCP
connections remaining in
TIME_WAIT for this period of time can cause excessive
resource
consumption in the operating system. "Excessive here
means that
resources than could have been used for new TCP
connections are tied up
in TCP connections in the TIME_WAIT state. A Solaris
deployment can
function normally with tens of thousands of TCP
connections in
TIME_WAIT. System administrators sometimes set
tcp_time_wait_interval
to low values with the expectation that TCP
performance will be
enhanced. TCP performance will not be enhanced by
decreasing
tcp_time_wait_interval. 

Another factor to consider is the TCP control block
chain.
tcp_conn_hash_size controls the number of TCP control
blocks in the
chain. Consider how many of these blocks would be
associated with
inactive connections when you measure TCP performance
and memory
consumption. See also 

Additionally, a TCP connection in TIME_WAIT can be
re-used by sending a
new SYN segment if:
	assigns its initial sequence number for the new
connection to be
larger than the largest sequence number it used on the
previous
connection incarnation, and
	returns to TIME-WAIT state if the SYN turns out to
be an old
duplicate. 

If tcp_time_wait_interval is set too low, a TCP
connection cannot be
re-used, resulting in the kernel recovering a resource
that could have
been re-used, requiring more operating system
resources.

Advantages of decreasing tcp_time_wait_interval from
the default
include:
	more rapid recovery of system resources associated
with sockets
	more connections can be handled
	less memory consumption

Disadvantages of decreasing tcp_time_wait_interval
include:
	more CPU time spent in recovering connections
	there is a possibility that data loss can occur
without notification
if set too low
	connections could be refused if old duplicate SYN
segments exist
	the connection cannot be re-used (new SYN)

The key to tuning tcp_time_wait_interval is that
evidence from testing
should be used to justify changing this parameter.
There are many
documents on the Internet and within Sun that would
seem to indicate
that tcp_time_wait_interval should always be reduced
as a matter of
course. Measure twice and cut once!

Some software uses "pooled TCP connections. This
means that a set of
connections is opened to a server and are never
closed, or rarely
closed. This results in very few TCP connections being
in TIME_WAIT at
a given moment. There is no need to adjust
tcp_time_wait_interval on
such systems.  An example of pooling is the
interaction between the Sun
ONE Web Proxy Server and the Sun ONE Directory Server
when the Web
Proxy Server is used in LDAP authentication mode. The
Web Proxy Server
opens a configurable number of LDAP connections to the
Directory Server
and leaves them open for the life of the Web Proxy
Server processes.
Thus, there are very few closes of TCP connections on
the Directory
Server, very few TCP connections in TIME_WAIT, and
therefore, no need
to decrease tcp_time_wait_interval on the Directory
Server. Another
very similar interaction is found in deployments of
the Sun ONE
Directory Proxy Server. The Directory Proxy Server
opens a pool of
connections to Directory Servers and leaves these
connections open,
resulting in few socket closes and hence, few
connections in TIME_WAIT.

On a system where TCP connections are short-lived,
decreasing
tcp_time_wait_interval will result in recovering
system resources more
quickly, which can result in the system being able to
accept or create
more TCP connections. A Web Server is a prime example
of a system with
many short-lived TCP connections. The best way to
determine if
tcp_time_wait_interval should be decreased on a system
is to examine
log files and note whether the software is unable to
accept or create
TCP connections. If this is the case, there may be
other ways to
ameliorate this condition, for example, increasing the
number of file
descriptors available to processes.  Another symptom
of
tcp_time_wait_interval being set too high is excessive
memory
consumption. Carefully monitor memory consumption
using SAR to
determine if memory utilization is too high.

The iPlanet Web Server 6.0 Performance Tuning, Sizing,
and Scaling
Guide states categorically that tcp_time_wait_interval
should be set to
60000 milliseconds.  Taken in the context that these
values were used
for performance benchmarking, this value is
reasonable. This does not
mean that every Solaris Web Server should use this
value. Setting
tcp_time_wait_interval too low can result in too much
time spent by the
kernel recovering sockets, no notification of data
loss to a
transmitter on a closed connection, and no way to
re-use the connection
with a new SYN segment.

The Solaris 9 Tunable Parameters manual states Do not
set the value
lower than 60 seconds. For more information, refer to
RFC 1122,
4.2.2.13.   The system administrator should have a
clear understanding
of that section of RFC1122 before changing this
parameter. The default
for this parameter in Solaris 9 is 60000 milliseconds.

Solaris SPECmail2001 and SPECweb99 settings were 60000
milliseconds for
tcp_time_wait_interval.
An often-used web resource,
http://www.sean.de/Solaris/soltune.html#system
recommends categorically
that tcp_time_wait_interval be set to 60000
milliseconds. The author
also states that "As Stevens repeatedly states in his
books, the
TIME_WAIT state is your friend. You should not
desperately try to avoid
it, rather try to understand it. The maximum segment
lifetime(MSL) is
the maximum interval a TCP segment may live in the
net. Thus waiting
twice this interval ensures that there are no leftover
segments coming
to haunt you. This is what the 2MSL is about.
Afterwards it is safe to
reuse the socket resource. 
tcp_time_wait_interval should generally be left at the
default setting,
unless empirical evidentiary reasons exist to change
this parameter.
Tests show that a system must have hundreds of
thousands of TCP
connections to warrant changing
tcp_time_wait_interval. When a system
has too many connections in TIME_WAIT, decrease
tcp_time_wait_interval.
The question to ask is: how many is too many? The only
answer is: when
new TCP connections cannot be accepted or created
because of the many
sockets in TIME_WAIT. When tcp_time_wait_interval is
decreased, keep in
mind the time factor. If a system has 100 sockets per
second leaving
TIME_WAIT, then decreasing tcp_time_wait_interval by
30 seconds (30000
milliseconds) results in 300000 sockets per minute
leaving TIME_WAIT
more quickly. 

Systems Administrators can believe falsely that
TIME_WAIT is their
enemy. TIME_WAIT is the friend of the administrator.
Use evidence for
changing tcp_time_wait_interval. Do not change it
unless you can prove
to yourself that the system needs it to be changed. In
other words, ten
thousand TCP connections in TIME_WAIT are not a bad
thing unless they
are preventing the system from creating or accepting
new TCP
connections. Use the netCheck.pl and  scripts to get a
quick look at
TCP connection states.



--- Raj <rajwin at yahoo.com> wrote:

> Hi Gurus,
>     There are more than 4000 connections in the
> TIME_WAIT condition in our SUN server. The server is

> running apache, tomcat and oracle. 
> 
> Why there are so many connections in TIME_WAIT. The
> load on the server is more than 15.
> 
> Regards,
> Raj
> 
> 
> 
> 		
> __________________________________
> Do you Yahoo!?
> New and Improved Yahoo! Mail - Send 10MB messages!
> http://promotions.yahoo.com/new_mail 
> _______________________________________________
> sunmanagers mailing list
> sunmanagers at sunmanagers.org
>
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
> 


=====
Terry Gardner
Boo's Dad



		
_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com



More information about the sunmanagers mailing list