Re: [PATCH rcu/dev 3/3] net: Use call_rcu_flush() for dst_destroy_rcu

From: Joel Fernandes
Date: Thu Nov 17 2022 - 13:18:44 EST


On Thu, Nov 17, 2022 at 5:49 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Thu, Nov 17, 2022 at 9:42 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
>
> >
> > Yes, I agree. Your comments here have not been useful (or respectful)
> > so I am Ok with that.
> >
> > - Joel
>
> Well, I have discovered that some changes went in networking tree
> without network maintainers being involved nor CCed.
>
> What can I say ?
>
> It seems I have no say, right ?

Sorry, I take responsibility for that. FWIW, the rxrpc change is not
yet in Linus's tree.

Also FWIW, the rxrpc case came up because we detected that it does a
scheduler wakeup from the callback. We did both static and dynamic
testing to identify callbacks that do wakeups throughout the kernel
(kernel patch available on request), as the pattern observed is things
doing wakeups typically are for use cases that are not freeing memory
but something blocking, similar to synchronize_rcu(). So it was a
"trivial/obvious" change to make for rxrpc which I might have assumed
did not need much supervision because it just reverts that API to the
old behavior -- still probably no excuse.

Again, we can talk this out no problem. But I would strongly recommend
not calling it "crazy thing", as we did all due diligence for almost a
year (talking about it at LPC, working through various code paths and
bugs, 4 different patch redesigns on the idea (including the opt-in
that you are bringing up), including a late night debugging session to
figure this out etc).

Just to clarify, I know you review/maintain a lot of the networking
code and I really appreciate that (not praising just for the sake).
And I care about the kernel too, just like you.

thanks,

- Joel


> commit f32846476afbe1f296c41d036219178b3dfb6a9d
> Author: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Date: Sun Oct 16 16:23:04 2022 +0000
>
> rxrpc: Use call_rcu_flush() instead of call_rcu()
>
> call_rcu() changes to save power may cause slowness. Use the
> call_rcu_flush() API instead which reverts to the old behavior.
>
> We find this via inspection that the RCU callback does a wakeup of a
> thread. This usually indicates that something is waiting on it. To be
> safe, let us use call_rcu_flush() here instead.
>
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
> index 22089e37e97f0628f780855f9e219e5c33d4afa1..fdcfb509cc4434b0781b76623532aff9c854ce60
> 100644
> --- a/net/rxrpc/conn_object.c
> +++ b/net/rxrpc/conn_object.c
> @@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
> * must carry a ref on the connection to prevent us getting here whilst
> * it is queued or running.
> */
> - call_rcu(&conn->rcu, rxrpc_destroy_connection);
> + call_rcu_flush(&conn->rcu, rxrpc_destroy_connection);
> }
>
> /*