Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

From: Eric Dumazet
Date: Tue Aug 11 2020 - 11:33:04 EST




On 8/11/20 3:37 AM, Jason Xing wrote:
> Hi everyone,
>
> Could anyone take a look at this issue? I believe it is of high-importance.
> Though Eric gave the proper patch a few months ago, the stable branch
> still hasn't applied or merged this fix. It seems this patch was
> forgotten :(


Sure, I'll take care of this shortly.

Thanks.

>
> Thanks,
> Jason
>
> On Thu, Jun 4, 2020 at 9:47 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>>
>> On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>>>
>>> On Thu, Jun 4, 2020 at 2:01 AM <kerneljasonxing@xxxxxxxxx> wrote:
>>>>
>>>> From: Jason Xing <kerneljasonxing@xxxxxxxxx>
>>>>
>>>> When using BBR mode, too many tcp socks cannot be released because of
>>>> duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
>>>> when RTO happens. Therefore, this situation maddly increases the slab
>>>> memory and then constantly triggers the OOM until crash.
>>>>
>>>> Besides, in addition to BBR mode, if some mode applies pacing function,
>>>> it could trigger what we've discussed above,
>>>>
>>>> Reproduce procedure:
>>>> 0) cat /proc/slabinfo | grep TCP
>>>> 1) switch net.ipv4.tcp_congestion_control to bbr
>>>> 2) using wrk tool something like that to send packages
>>>> 3) using tc to increase the delay and loss to simulate the RTO case.
>>>> 4) cat /proc/slabinfo | grep TCP
>>>> 5) kill the wrk command and observe the number of objects and slabs in
>>>> TCP.
>>>> 6) at last, you could notice that the number would not decrease.
>>>>
>>>> v2: extend the timer which could cover all those related potential risks
>>>> (suggested by Eric Dumazet and Neal Cardwell)
>>>>
>>>> Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
>>>> Signed-off-by: liweishi <liweishi@xxxxxxxxxxxx>
>>>> Signed-off-by: Shujin Li <lishujin@xxxxxxxxxxxx>
>>>
>>> That is not how things work really.
>>>
>>> I will submit this properly so that stable teams do not have to guess
>>> how to backport this to various kernels.
>>>
>>> Changelog is misleading, this has nothing to do with BBR, we need to be precise.
>>>
>>
>> Thanks for your help. I can finally apply this patch into my kernel.
>>
>> Looking forward to your patchset :)
>>
>> Jason
>>
>>> Thank you.