Re: [PATCH rcu/dev 1/3] net: Use call_rcu_flush() for qdisc_free_cb

From: Joel Fernandes
Date: Thu Nov 17 2022 - 17:00:15 EST




> On Nov 17, 2022, at 4:44 PM, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> <joel@xxxxxxxxxxxxxxxxx> wrote:
>>
>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
>> causes a networking test to fail in the teardown phase.
>>
>> The failure happens during: ip netns del <name>
>>
>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
>> call_rcu_flush() to revert to the old behavior. With that, the test passes.
>>
>> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>> ---
>> net/sched/sch_generic.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index a9aadc4e6858..63fbf640d3b2 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
>>
>> trace_qdisc_destroy(qdisc);
>>
>> - call_rcu(&qdisc->rcu, qdisc_free_cb);
>> + call_rcu_flush(&qdisc->rcu, qdisc_free_cb);
>> }
>
> I took a look at this one.
>
> qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the
> 'struct Qdisc'
>
> I do not see why we need to force a flush for this (small ?) piece of memory.

I’ll try to drop that and rerun the test, and get back to you. It could be that there is a different callback that this flush() is compensating for, or something. I am pretty sure at one point, dropping this patch made the test fail most of the time. Now it passes 100%.

I’ll also attempt to collect a complete trace, maybe I’ll learn some networking code in the process..

Thanks!