Re: [PATCH] workqueue: Fix race in schedule and flush work

From: Tejun Heo
Date: Mon Feb 14 2022 - 15:54:15 EST


Hello,

> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..a3f53f859e9d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3326,28 +3326,38 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
> */
> int schedule_on_each_cpu(work_func_t func)
> {
> - int cpu;
> struct work_struct __percpu *works;
> + cpumask_var_t sched_cpumask;
> + int cpu, ret = 0;
>
> - works = alloc_percpu(struct work_struct);
> - if (!works)
> + if (!alloc_cpumask_var(&sched_cpumask, GFP_KERNEL))
> return -ENOMEM;
>
> + works = alloc_percpu(struct work_struct);
> + if (!works) {
> + ret = -ENOMEM;
> + goto free_cpumask;
> + }
> +
> cpus_read_lock();
>
> - for_each_online_cpu(cpu) {
> + cpumask_copy(sched_cpumask, cpu_online_mask);
> + for_each_cpu_and(cpu, sched_cpumask, cpu_online_mask) {

This definitely would need a comment explaining what's going on cuz it looks
weird to be copying the cpumask which is supposed to stay stable due to the
cpus_read_lock(). Given that it can only happen during early boot and the
online cpus can only be expanding, maybe just add sth like:

if (early_during_boot) {
for_each_possible_cpu(cpu)
INIT_WORK(per_cpu_ptr(works, cpu), func);
}

BTW, who's calling schedule_on_each_cpu() that early during boot. It makes
no sense to do this while the cpumasks can't be stabilized.

Thanks.

--
tejun