Re: [PATCH] workqueue: Fix race in schedule and flush work

From: Paul E. McKenney
Date: Wed Feb 16 2022 - 14:07:10 EST


On Wed, Feb 16, 2022 at 07:49:39PM +0100, Padmanabha Srinivasaiah wrote:
> On Mon, Feb 14, 2022 at 09:43:52AM -1000, Tejun Heo wrote:
> > Hello,
> >
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index 33f1106b4f99..a3f53f859e9d 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -3326,28 +3326,38 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
> > > */
> > > int schedule_on_each_cpu(work_func_t func)
> > > {
> > > - int cpu;
> > > struct work_struct __percpu *works;
> > > + cpumask_var_t sched_cpumask;
> > > + int cpu, ret = 0;
> > >
> > > - works = alloc_percpu(struct work_struct);
> > > - if (!works)
> > > + if (!alloc_cpumask_var(&sched_cpumask, GFP_KERNEL))
> > > return -ENOMEM;
> > >
> > > + works = alloc_percpu(struct work_struct);
> > > + if (!works) {
> > > + ret = -ENOMEM;
> > > + goto free_cpumask;
> > > + }
> > > +
> > > cpus_read_lock();
> > >
> > > - for_each_online_cpu(cpu) {
> > > + cpumask_copy(sched_cpumask, cpu_online_mask);
> > > + for_each_cpu_and(cpu, sched_cpumask, cpu_online_mask) {
> >
> > This definitely would need a comment explaining what's going on cuz it looks
> > weird to be copying the cpumask which is supposed to stay stable due to the
> > cpus_read_lock().Given that it can only happen during early boot and the
> > online cpus can only be expanding, maybe just add sth like:
> >
> > if (early_during_boot) {
> > for_each_possible_cpu(cpu)
> > INIT_WORK(per_cpu_ptr(works, cpu), func);
> > }
> >
>
> Thanks tejun for the reply and suggestions.
>
> Yes, unfortunately cpus_read_lock not keeping cpumask stable at
> secondary boot. Not sure, may be it only gurantee 'cpu' dont go down
> under cpus_read_[lock/unlock].
>
> As suggested will tryout something like:
> if (system_state != RUNNING) {
> :
> }
> > BTW, who's calling schedule_on_each_cpu() that early during boot. It makes
> > no sense to do this while the cpumasks can't be stabilized.
> >
> It is implemenation of CONFIG_TASKS_RUDE_RCU.

Another option would be to adjust CONFIG_TASKS_RUDE_RCU based on where
things are in the boot process. For example:

// Wait for one rude RCU-tasks grace period.
static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
{
if (num_online_cpus() <= 1)
return; // Fastpath for only one CPU.
rtp->n_ipis += cpumask_weight(cpu_online_mask);
schedule_on_each_cpu(rcu_tasks_be_rude);
}

Easy enough either way!

Thanx, Paul