Re: Blocking for too long in SO_LINGER...

From: Steve Kann (stevek@SteveK.COM)
Date: Fri Jan 28 2000 - 11:08:39 EST


On Fri, Jan 28, 2000 at 05:38:05PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > I did a strace on the process, and found that it was blocking on the
> > close() of a socket with SO_LINGER set. I'm not sure if strace is
> > reporting this properly, but I think that the timeout being set is 8
> > seconds (which seems like an awfully bad choice by real if so).
>
> No, 8 is option length. This old strace does not show SO_LINGER option.

Right, it is setting it to 0, see below:

> > Any idea what is happening here? Any kind of quick-fix I can do here:
> > (I have a high-proofile event tomorrow, and naturally, I don't have the
> > source to the silly application)..
>
> What is this pnserver? I suspect it sets linger to 0, which is illegal
> from all the viewpoints. From viewpoint of 2.2 it is infinite linger,
> from viewpoint of BSD and 2.3 it is immediate socket destruction.
>
> It is interesting compatibility question, try to get newer strace,
> probably it is better. Well, or check this with gdb. Or get my strace
> from ftp.inr.ac.ru/ip-routing.
>
> > Maybe I can quickly get the kernel
> > to ignore SO_LINGER either entirely (and I guess I'll find out what
> > breaks), or for just particular processes.
>
> Nothing will break. Programs using SO_LINGER correctly just do not exist
> in the nature. It is fully dead option.

Thanks, Alexey,

        pnserver is Real Networks' server product. I quickly put a
patch into sock.c which logged all SO_LINGER options, and selectively
disabled the setting of it in certain cases.

        It turns out that the program was setting it to 0, and I see how
in schedule_timer(), I think, that gets turned into an infinite timeout.

        I think that the correct action is this, which I put into a
wrapper library last night:

int setsockopt(int s, int level, int optname, const void *optval,
        socklen_t optlen)
{
        struct linger *ling;
        ling = (struct linger *) optval;

        if(optname == SO_LINGER && ling->l_linger==0)
                ling->l_onoff=0;
        
        return org_setsockopt( s, level, optname, optval, optlen );
}

Which is to basically just ignore the option altogether..

I think that David S Miller made the change to this section of code in
the kernel, which was definately broken before:

net/ipv4/af_inet.c in 2.2.5:

                timeout = 0;
                if (sk->linger && !(current->flags & PF_EXITING)) {
                        timeout = MAX_SCHEDULE_TIMEOUT;

                        /* XXX This makes no sense whatsoever... -DaveM
                         * */
                        if (!sk->lingertime)
                                timeout = HZ*sk->lingertime;
                }

net/ipv4/af_inet.c in 2.2.12:
                timeout = 0;
                if (sk->linger && !(current->flags & PF_EXITING)) {
                        timeout = HZ * sk->lingertime;
                        if (!timeout)
                                timeout = MAX_SCHEDULE_TIMEOUT;
                }

So, formerly, if lingertime was set to 0, the timeout sent to tcp_close
was 0, otherwise, it was MAX_SCHEDULE_TIMEOUT. In newer kernels, if it
is set to 0, you get MAX_SCHEDULE_TIMEOUT, and if not, you get the
actual timeout value you ask for.

This, of course, is completely undocumented, and doesn't seem to be what
BSD or solaris does either... While I see people's point that it's a
bad idea to set SO_LINGER {1,0} anyways, maybe the kernel should just
ignore the 0, and keep timeout to 0 in that case, like it had
previously.. Therefore, the section of code should read:

net/ipv4/af_inet.c (with 0 timeouts staying 0):
                timeout = 0;
                if (sk->linger && !(current->flags & PF_EXITING))
                        timeout = HZ * sk->lingertime;

I suppose this is open for debate, and I'm not an
authority here, but here's what I think:

If someone writes code with SO_LINGER {1,0} they probably do _not_
expect to block at all on close(). Certainly, having them block
indefinately is probably bad. If you want to block indefinately, maybe
we could use some other sentinel other than 0 -- i.e. -1, or LONG_MAX
(i.e. MAX_SCHEDULE_TIMEOUT).

Then, we could have that code read:

net/ipv4/af_inet.c (with 0 timeouts staying 0):
        timeout = 0;
        if (sk->linger && !(current->flags & PF_EXITING)) {
                if (sk_linger == MAX_SCHEDULE_TIMEOUT)
                        timeout = MAX_SCHEDULE_TIMEOUT;
                else
                        timeout = HZ * sk->lingertime;
        }

Alan, Dave?

-SteveK

-- 
    Steve Kann  - Horizon Live Distance Learning - 841 Broadway, Suite 502
   P:stevek@SteveK.COM - B:stevek@HorizonLive.com - R:KC2FBU (212) 533-1775
  "The box said 'Requires Windows 95, NT, or better,' so I installed Linux."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:21 EST