Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

From: Luis Henriques
Date: Thu Jun 12 2014 - 08:56:03 EST


(Adding Tyler to the thread, as I should have done in the first place)

Willy Tarreau <w@xxxxxx> writes:

> Hi Luis,
>
> On Wed, Jun 11, 2014 at 07:46:44PM +0100, Luis Henriques wrote:
>> Hi Willy,
>>
>> On Mon, May 12, 2014 at 02:32:59AM +0200, Willy Tarreau wrote:
>> > 2.6.32-longterm review patch. If anyone has any objections, please let me know.
>> >
>>
>> During Ubuntu Lucid kernel regression testing, after the merge of
>> 2.6.32.62, we found problems with the following patches
>>
>> [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
>> (Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da)
>>
>> [ 065/143] net: check net.core.somaxconn sysctl values
>> (Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509)
>>
>> The following two stack traces were found in kernel logs:
>
> Aie :-/
>
>> [ 0.199908] sysctl table check failed: /net/core/somaxconn .3.1.18 Missing strategy
>> [ 0.201100] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
>> [ 0.202173] Call Trace:
> (...)
>> and here's a bug link:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326473
>
> I think that Tyler's suggest is the right approach.
>
>> For the Ubuntu Lucid kernel, we ended up reverting the offending
>> commits. Since I was able to reproduce this problem with a vanilla
>> 2.6.32.62, you may want to take a similar action for the next 2.6.32
>> release.
>
> The initial bug is hard to debug on live systems. I've been hit myself
> and it took me a lot of time to find the root cause. The problem is that
> the backlog is stored on an unsigned short while the sysctl is stored
> on an int, and the value is naturally truncated, so when you use an
> somaxconn of N*65536 + just a few, you end up with just a few and drop
> a lot of SYNs even under moderate loads. Worse, the only people who
> touch these values are those who run under high loads and who are the
> most likely to face the issue.
>
> Thus if there's a quick way to check that Tyler's fix reliably addresses
> the issue, I think we should take it instead. Of course I understand that
> in the mean time the revert is better for you!
>
> Regards,
> Willy
>

I was finally able to spend some more time with this and tried (a
modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
Although I haven't done any extended testing, I don't see the two
stack traces and the /proc/sys/net/ipv4/ directory seems to be
correctly populated.

I'm attaching the patch I've used, based on Tyler's.

Cheers,
--
LuÃs

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index e2eaf29..e6bf72c 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -121,7 +121,8 @@ static struct ctl_table netns_core_table[] = {
.mode = 0644,
.extra1 = &zero,
.extra2 = &ushort_max,
- .proc_handler = proc_dointvec_minmax
+ .proc_handler = proc_dointvec_minmax,
+ .strategy = &sysctl_intvec
},
{ .ctl_name = 0 }
};
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 910fa54..d957371 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -241,7 +241,8 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &tcp_syn_retries_min,
- .extra2 = &tcp_syn_retries_max
+ .extra2 = &tcp_syn_retries_max,
+ .strategy = &sysctl_intvec
},
{
.ctl_name = NET_IPV4_NONLOCAL_BIND,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/