Patch to fix TCP window offerings with low rcvbuf.

Eric.Schenk@dna.lth.se
Mon, 12 May 1997 20:47:53 +0200


A while ago Keith Owens reported to me that the Linux TCP can get
into a situation where it offers a window smaller than 1 MSS
if the rcvbuf is 1 MSS.

The following patch goes from setting windows to the nearest
smaller multiple of 1024, to setting windows to the nearest smaller
multiple of mss. The goal here is twofold:

o increase the chance of offering space for a full packet when running
near the limits.

o Fix the problems Keith observed.

This patch goes over pre-2.0.31. Everyone who is testing pre-2.0.31
please try this patch out and as usual report any problems to me.

Thanks,

--
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

--- vanilla/linux/net/ipv4/tcp_output.c Thu May 8 21:30:11 1997 +++ linux/net/ipv4/tcp_output.c Sun May 11 01:26:41 1997 @@ -51,22 +51,10 @@ * RECV.NEXT + RCV.WIN fixed until: * RCV.BUFF - RCV.USER - RCV.WINDOW >= min(1/2 RCV.BUFF, MSS)" * - * Experiments against BSD and Solaris machines show that following - * these rules results in the BSD and Solaris machines making very - * bad guesses about how much data they can have in flight. - * - * Instead we follow the BSD lead and offer a window that gives - * the size of the current free space, truncated to a multiple - * of 1024 bytes. If the window is smaller than - * min(sk->mss, MAX_WINDOW/2) - * then we advertise the window as having size 0, unless this - * would shrink the window we offered last time. - * This results in as much as double the throughput as the original - * implementation. - * * We do BSD style SWS avoidance -- note that RFC1122 only says we * must do silly window avoidance, it does not require that we use - * the suggested algorithm. + * the suggested algorithm. Following BSD avoids breaking header + * prediction. * * The "rcvbuf" and "rmem_alloc" values are shifted by 1, because * they also contain buffer handling overhead etc, so the window @@ -74,33 +62,41 @@ */ int tcp_new_window(struct sock * sk) { - unsigned long window; + unsigned long window = sk->window; unsigned long minwin, maxwin; + unsigned long free_space; /* Get minimum and maximum window values.. */ minwin = sk->mss; if (!minwin) minwin = sk->mtu; + if (!minwin) { + printk(KERN_DEBUG "tcp_new_window: mss fell to 0.\n"); + minwin = 1; + } maxwin = sk->window_clamp; if (!maxwin) maxwin = MAX_WINDOW; + if (minwin > maxwin/2) minwin = maxwin/2; /* Get current rcvbuf size.. */ - window = sk->rcvbuf/2; - if (window < minwin) { + free_space = sk->rcvbuf/2; + if (free_space < minwin) { sk->rcvbuf = minwin*2; - window = minwin; + free_space = minwin; } /* Check rcvbuf against used and minimum window */ - window -= sk->rmem_alloc/2; - if ((long)(window - minwin) < 0) /* SWS avoidance */ - window = 0; + free_space -= sk->rmem_alloc/2; + if ((long)(free_space - minwin) < 0) /* SWS avoidance */ + return 0; + + /* Try to avoid the divide and multiply if we can */ + if (window <= free_space - minwin || window > free_space) + window = (free_space/minwin)*minwin; - if (window > 1023) - window &= ~1023; if (window > maxwin) window = maxwin; return window;