SUMMARY: Bind error - address already in use

From: Michael D'Errico (mike@software.com)
Date: Tue Jul 06 1993 - 03:41:51 CDT


Hey everyone,

This was my original note to the net:

> I have been trying out sockets in Perl and have run into the following
> snag. A process does socket->bind->listen->accept.....shutdown->exit.
> The next time I try to run the program, the bind() fails because the
> address is already in use.

> Is there a way to either 1) have the program close the socket without
> causing this problem, 2) have the program force a bind on a port even
> if it gets this error (I KNOW the port isn't being used), or 3) do some
> other tricks to make the socket available again from UNIX?

> Obviously (1) is most preferable, followed by (3).

I received many followups and emails which I've summarized below. I think
the real answer is that it doesn't work on Solaris 2.x the way it has
in the past, since none of the simple fixes seem to work. By simple
fixes, I mean adding a 'close' call or setting some socket options....

Since the problem is specific to Solaris 2.x, I have cross-posted this
summary to comp.unix.solaris also.

A fix can be had if you read the last article which I appended to the
bottom of this message. It's from George Ball of Sun in response to a
different question, but I figured it was relevant enough to post here.

Thanks go to (not in any order whatsoever):
        koppenh@dia.informatik.uni-stuttgart.de (Andreas Koppenhoefer)
        wucolin@popeye.CIS.McMaster.CA (Colin Wu)
        georgeb@ukcsd.uk.sun.com (George Ball- U.K. - Answer Centre contractor)
        huober_j@oracle.rz.uni-ulm.de (Joachim Huober)
        jwill@key.amdahl.com (John Williams)
        jpbelang@fatman.crim.ca (Jean-Philippe Belanger)
        Vivek Khera <khera@cs.duke.edu>
        Steven Parkes <steven@crhc.uiuc.edu>
        jeffp@BRS.Com (Jeffrey S. Pace)
        operator@rooney.fstrf.org (operator)

Michael D'Errico
mike@software.com

======================================================================
Use setsocktopt after crating your socket and before (!) binding a
local adress with it:
 
        require 'sys/socket.ph';
        setsockopt(S, &SOL_SOCKET, &SO_REUSEADDR, 1);
 
For details look into your man at setsockopt(2).

======================================================================
I have never used Perl but I get this error when I don't close() the file
descriptor (in C). The kernel has to reset the port when, I suppose, he
has time to do it.

To reuse an address, you have to (in C) do a setsockopt() with the SO_REUSEADDR
as an option. This will not allow you to get two sockets to the same port
on the same machine though, as this is not allowed.

======================================================================
I've had the same problem, but never found a solution. If you get any
responses, please summarize to the net. i've even tried close()'ing
the socket file handle, but that doesn't do the trick, either.

======================================================================
I assume you are using TCP: this is required by the TCP protocol. In order to
make sure that you don't accidently get packets left over from a previous
connection, there is a minimum idle time on time on any TCP port. [I think
this time out is 2*MTL [twice the maximum life time of an IP packet. I think
that makes it a few minutes. Sorry; I don't have the numbers in front of me.
They can be gotten from the RFCs.]

|> Is there a way to either 1) have the program close the socket without
|> causing this problem, 2) have the program force a bind on a port even
|> if it gets this error (I KNOW the port isn't being used), or 3) do some
|> other tricks to make the socket available again from UNIX?

Basically, no, since any form of allowing that would violate the TCP spec.

======================================================================
I have also been meaning to post this question. In my case
if I use the netstat command, I'll see that the port is in
the FIN_WAIT2 state. That port must timeout or something to
become freed up again.

======================================================================
There's a socket option (see, um, I think it's setsockopt(3?)) called
SO_REUSEADDR you need to set on the socket to be able to use it again
immediately. Otherwise the system makes you wait x units of time.

======================================================================
I think, the best way to do what you want is to fork a process befor doing
accept and let the socket you listen on open. Or you could put your program
into netd.conf (this is the best way, if you are a super user :-) )

Try

#! /usr/bin/perl

require 'sys/socket.ph';
require 'sys/wait.ph';
($server,$port) = @ARGV;
$port = 2350 unless $port;
$server = 'echo server not specified' unless $server;

$SIG{'CHLD'} = 'IGNORE';

$socketaddr = 'S n a4 x8';

(($name, $aliases, $proto) = getprotobyname('tcp')) || die;
((($name, $aliases, $port) = getservbyname($port, 'tcp')) || die)
        unless $port =~ /^\d+$/;
$this = pack($socketaddr, &AF_INET, $port, "\0\0\0\0");

select(NS); $| = 1; select(stdout);

socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
bind(S, $this) || die "bind: $!";
listen(S, 5) || die "listen: $!";

select(S); $| = 1; select(STDOUT);
for (;;) {
    until (defined($addr = accept(NS,S)))
    { die "$!" unless ($! =~ /Interrupted system call/) }
    FORK:
    {
        if ($child = fork)
        {
                close(NS);
        }
        elsif (defined($child))
        {
                setsockopt(NS,&SOL_SOCKET,&SO_KEEPALIVE,undef,undef)
                        || warn "setsockopt: $!";
                open(STDIN,"<&NS") || die "open: $!";
                open(STDOUT,">&NS") || die "open: $!";

                exec($server) || die "exec $!";
                close(NS);
                exit;
        }
        elsif ($! =~ /No more process/)
        {
                # EAGAIN, supposedly recoverable fork error
                sleep 5;
                redo FORK;
        }
        else
        {
                warn "Can't fork $!";
                close(NS);
        }
    }
}

======================================================================
The way I got around this problem was to set the SO_REUSEADDR option once the
socket was opened:

  setsockopt($sock, &SOL_SOCKET, &SO_REUSEADDR, 1);

&SOL_SOCKET and &SO_REUSEADDR are defined in socket.ph (use h2ph to generate it
from /usr/include/socket.h).

======================================================================
From: georgeb@ukcsd.uk.sun.com (George Ball- U.K. - Answer Centre contractor)

In article <1993Jun29.121838.1@drycas.club.cc.cmu.edu>, ghod@drycas.club.cc.cmu.edu
(Just GNU it.) writes:
|Oh Great Socket Gurus:
|
|I am porting some proven client server code to Solaris and am encountering
|an, ah, interesting bug. I have a server program which listens on a port (in
|the 5000 range) for multiple clients. After the first client connects, any
|additional clients that try to connect() fail with an errno of 126. To
|make matters worse, the port becomes "frozen" for about four
|minutes after the server shuts down. I use setsockopts()s for non-blocking,
|non-lingering, reuseaddr, and to change the socket buffer sizes. The problem
|is strange enough that I would suspect my coding, except that the same code
|works perfectly on SunOs, AIX, and FTX (Stratus). The code also seems to work
|fine when both the client(s) and the server are running on the same Solaris
|machine, but not when talking between two Solaris boxes. Weird.

I think I have come across this one. When the server port locks up, use
netstat -af inet to look at the status of all your network ports. I expect
you will see that your port is marked in the state TIME_WAIT.

If so, then I'm afraid it's not a bug but a ... feature... You're running
up against a part of the TCP protocol dealing with the release of
connections. Following the specs, there is a time delay of around 4
minutes before the connection is fully released. Earlier versions of
UNIX and SunOS didn't follow this directly, and the socket was released
almost immediately (I think it was about 30 seconds delay). Now in
Solaris 2, the new TCP implementation uses the default values which
result in this admittedly rather annoying delay.

I'm surprised that SO_RESUSEADDR doesn't seem to work, but there is another
thing to try. Use the command

# ndd -set /dev/tcp tcp_close_wait_interval <some number of milliseconds>

which changes the length of time that the socket hangs around before
clearing up finally.

The default value is 120000 which leads to the 4 minute delay (which is
actually twice the value of this parameter!) You can set it as low as
1000 for 1 second.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:00 CDT