Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()

From: Vincent Guittot
Date: Fri Nov 05 2021 - 11:08:49 EST


On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
<vincent.donnefort@xxxxxxx> wrote:
>
> Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> update_top_cache_domain(). One scenario being:
>
> CPU1 CPU2
> ==================================================================
>
> per_cpu(sd_llc_id, CPUX) => 0
> partition_sched_domains_locked()
> detach_destroy_domains()
> cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
> per_cpu(sd_llc_id, CPUX) => 0
> per_cpu(sd_llc_id, CPUX) = CPUX
> per_cpu(sd_llc_id, CPUX) => CPUX
> return false
>
> ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> is a warning triggered from ttwu_queue_wakelist().
>
> Avoid a such race in cpus_share_cache() by always returning true when
> this_cpu == that_cpu.
>
> Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> Reported-by: Jing-Ting Wu <jing-ting.wu@xxxxxxxxxxxx>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>

Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2611b9cf503..f5ca15cdcff4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
>
> bool cpus_share_cache(int this_cpu, int that_cpu)
> {
> + if (this_cpu == that_cpu)
> + return true;
> +
> return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> }
>
> --
> 2.25.1
>