Re: [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting

From: Jakub Kicinski
Date: Mon May 08 2023 - 21:33:32 EST


On Fri, 5 May 2023 17:16:10 -0700 Peilin Ye wrote:
> Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
> adds a flower filter X to A.
>
> Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
> b2) to replace A, then adds a flower filter Y to B.
>
> Thread 1 A's refcnt Thread 2
> RTM_NEWQDISC (A, RTNL-locked)
> qdisc_create(A) 1
> qdisc_graft(A) 9
>
> RTM_NEWTFILTER (X, RTNL-lockless)
> __tcf_qdisc_find(A) 10
> tcf_chain0_head_change(A)
> mini_qdisc_pair_swap(A) (1st)
> |
> | RTM_NEWQDISC (B, RTNL-locked)
> RCU 2 qdisc_graft(B)
> | 1 notify_and_destroy(A)
> |
> tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-lockless)
> qdisc_destroy(A) tcf_chain0_head_change(B)
> tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd)
> mini_qdisc_pair_swap(A) (3rd) |
> ... ...
>
> Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
> mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during
> ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
> on eth0 will not find filter Y in sch_handle_ingress().
>
> This is just one of the possible consequences of concurrently accessing
> net_device::miniq_{in,e}gress pointers. The point is clear, however:
> B's first call to mini_qdisc_pair_swap() should take place after A's
> last call, in qdisc_destroy().

Great analysis, thanks for squashing this bug.

Have you considered creating a fix more localized to the miniq
implementation? It seems that having per-device miniq pointers is
incompatible with using reference counted objects. So miniq is
a more natural place to solve the problem. Otherwise workarounds
in the core keep piling up (here qdisc_graft()).

Can we replace the rcu_assign_pointer in (3rd) with a cmpxchg()?
If active qdisc is neither a1 nor a2 we should leave the dev state
alone.