Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

From: Yu Liao
Date: Thu Aug 24 2023 - 03:26:20 EST


On 2023/8/23 18:14, Thomas Gleixner wrote:
> Subject: cpu/hotplug: Prevent self deadlock on CPU hot-unplug
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Wed, 23 Aug 2023 10:47:02 +0200
>
> Xiongfeng reported and debugged a self deadlock of the task which initiates
> and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
>
> CPU1 CPU2
>
> T1 sets cfs_quota
> starts hrtimer cfs_bandwidth 'period_timer'
> T1 is migrated to CPU2
> T1 initiates offlining of CPU1
> Hotplug operation starts
> ...
> 'period_timer' expires and is re-enqueued on CPU1
> ...
> take_cpu_down()
> CPU1 shuts down and does not handle timers
> anymore. They have to be migrated in the
> post dead hotplug steps by the control task.
>
> T1 runs the post dead offline operation
> T1 is scheduled out
> T1 waits for 'period_timer' to expire
>
> T1 waits there forever if it is scheduled out before it can execute the hrtimer
> offline callback hrtimers_dead_cpu().
>
> Cure this by delegating the hotplug control operation to a worker thread on
> an online CPU. This takes the initiating user space task, which might be
> affected by the bandwidth timer, completely out of the picture.
>
> Reported-by: Xiongfeng Wang <wangxiongfeng2@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@xxxxxxxxxx
> ---
> kernel/cpu.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1467,8 +1467,22 @@ static int __ref _cpu_down(unsigned int
> return ret;
> }
>
> +struct cpu_down_work {
> + unsigned int cpu;
> + enum cpuhp_state target;
> +};
> +
> +static long __cpu_down_maps_locked(void *arg)
> +{
> + struct cpu_down_work *work = arg;
> +
> + return _cpu_down(work->cpu, 0, work->target);
> +}
> +
> static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
> {
> + struct cpu_down_work work = { .cpu = cpu, .target = target, };
> +
> /*
> * If the platform does not support hotplug, report it explicitly to
> * differentiate it from a transient offlining failure.
> @@ -1477,7 +1491,15 @@ static int cpu_down_maps_locked(unsigned
> return -EOPNOTSUPP;
> if (cpu_hotplug_disabled)
> return -EBUSY;
> - return _cpu_down(cpu, 0, target);
> +
> + /*
> + * Ensure that the control task does not run on the to be offlined
> + * CPU to prevent a deadlock against cfs_b->period_timer.
> + */
> + cpu = cpumask_any_but(cpu_online_mask, cpu);
> + if (cpu >= nr_cpu_ids)
> + return -EBUSY;
> + return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
> }
>
> static int cpu_down(unsigned int cpu, enum cpuhp_state target)

Thanks for the patch. Tested in v6.5-rc5 with test script provided by
Xiongfeng, this patch works.

Tested-by: Yu Liao <liaoyu15@xxxxxxxxxx>

Best regards,
Yu