Re: [PATCH v2] tcp: fix connection reset due to tw hashdance race.

From: Eric Dumazet
Date: Thu Jun 08 2023 - 00:13:24 EST


On Thu, Jun 8, 2023 at 5:59 AM Duan,Muquan <duanmuquan@xxxxxxxxx> wrote:
>
> Hi, Eric,
>
> Thanks a lot for your explanation!
>
> Even if we add reader lock, if set the refcnt outside spin_lock()/spin_unlock(), during the interval between spin_unlock() and refcnt_set(), other cpus will see the tw sock with refcont 0, and validation for refcnt will fail.
>
> A suggestion, before the tw sock is added into ehash table, it has been already used by tw timer and bhash chain, we can firstly add refcnt to 2 before adding two to ehash table,. or add the refcnt one by one for timer, bhash and ehash. This can avoid the refcont validation failure on other cpus.
>
> This can reduce the frequency of the connection reset issue from 20 min to 180 min for our product, We may wait quite a long time before the best solution is ready, if this obvious defect is fixed, userland applications can benefit from it.
>
> Looking forward to your opinions!

Again, my opinion is that we need a proper fix, not work arounds.

I will work on this a bit later.

In the meantime you can apply locally your patch if you feel this is
what you want.