Re: minor patch for 2.1.88 net/core/sock.c

A.N.Kuznetsov (
Fri, 6 Mar 1998 14:17:19 +0300 (MSK)


> > The code was wrong, but the fix is wrong too.
> No, I think the fix is fine.

Yes, I was wrong. Yesterday I understood that the fix was correct.
I missed, that if wmem_alloc increased BEFORE kmalloc,
sock_kmalloc will never consume more than sndbuf in any case.

> > We cannot leave atomic_read(&sk->wmem_alloc) >= sk->sndbuf here
> Why? The "sndbuf" should be considered to be purely advisory.

We have no another method to make flow control and memory consumption
control. If sk->wmem_alloc >= sk->sndbuf socket must go to sleep.

> > (it can occur, if sk->wmem_alloc was increased by another process while sleep),
> > because it would block subsequent sock_alloc_send_skb forever.
> No, it should block only until the packet has been sent. The bug is
> something else - like the write allocation never going away or something.

The data will never be sent if memory has been consumed by sock_kmalloc.
Subsequent sock_wmalloc will fail because the space is clogged
with sock_kmalloc'ed data (it is if we would allow to overcommit it)

> It looks like sock_kmalloc() is just completely broken as designed, and
> should just be removed: there is one user of it, but it should probably
> use sock_wmalloc() instead which fixes up the counts correctly by having a
> correct skb->destroy function.

Yes, I proposed to use this approach, when Andy invented this
sock_kmalloc. Though skb's give some overhead, it looked the most nice for me.
Unfortunately, it appeared wrong.

(I beg pardon for boring explanation following below,
it is rather for myself to record the knowlegde :-))

sock_kmalloc is crucially different of sock_wmalloc.
sock_wmalloc'ed data are drained by networking devices (non-TCP)
or by protocol logic (TCP).

sock_kmalloc'ed data are drained only by allocator AFTER an operation
(mainly, user write to socket) is completed, so that if we would
fill all the available memory only with sock_kmalloc'ed data,
socket (in blocking mode) will hung forever, waiting for space.

The solution was the following:

sock_wmalloc checks for wmem_alloc < sndbuf (it allows overcommit
by one packet, so that if sndbuf is 1 byte, socket will transmit packet
by packet, but packets may have arbirtatry sizes)

sock_kmalloc checks for wmem_alloc + size < sndbuf, so that
part of wmem_alloc, consumed by it, never exceeds sndbuf and
we always may sock_wmalloc and transmit at least one packet.

OK. Bill's fix makes exactly this thing.

Actually, it has the only user only because another places
are not repaired yet. Well, I'd prefer to ignore the problem at all
(it is pretty silly to plug this hole, when we can eat all the memory
in linux in hundreds of another ways).

Moreover, it is broken in principle, because if sk->write_space callback
is more sophisticated (f.e. checking for wmem_alloc*2 < sndbuf
before wakeup would result in better schedule in many cases)
it could block socket forever. Besides that, it breaks (though, not
very seriously) normal blocking socket semantics, returning
ENOBUFS instead of waiting for space.

Alexey Kuznetsov

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to