Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()

From: Peter Zijlstra
Date: Mon Nov 08 2021 - 06:22:13 EST


On Fri, Nov 05, 2021 at 04:08:33PM +0100, Vincent Guittot wrote:
> On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
> <vincent.donnefort@xxxxxxx> wrote:
> >
> > Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> > the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> > update_top_cache_domain(). One scenario being:
> >
> > CPU1 CPU2
> > ==================================================================
> >
> > per_cpu(sd_llc_id, CPUX) => 0
> > partition_sched_domains_locked()
> > detach_destroy_domains()
> > cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
> > per_cpu(sd_llc_id, CPUX) => 0
> > per_cpu(sd_llc_id, CPUX) = CPUX
> > per_cpu(sd_llc_id, CPUX) => CPUX
> > return false
> >
> > ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> > is a warning triggered from ttwu_queue_wakelist().
> >
> > Avoid a such race in cpus_share_cache() by always returning true when
> > this_cpu == that_cpu.
> >
> > Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> > Reported-by: Jing-Ting Wu <jing-ting.wu@xxxxxxxxxxxx>
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> > Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f2611b9cf503..f5ca15cdcff4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
> >
> > bool cpus_share_cache(int this_cpu, int that_cpu)
> > {
> > + if (this_cpu == that_cpu)
> > + return true;
> > +
> > return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> > }

Blergh, that's annoying. Thanks guys, picked it up for /urgent