Re: [PATCH] kthread_worker: Flush all delayed works when destroy kthread worker

From: Petr Mladek
Date: Tue Jan 03 2023 - 07:24:20 EST


On Fri 2022-12-23 21:16:01, Zqiang wrote:
> When destroy a kthread worker, only flush all current works on
> kthread worker, this is not very sufficient, there may be some
> delayed works in the pending state,

Great catch!

> this commit therefore add
> flush delayed works function in kthread_destroy_worker().
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1375,6 +1375,35 @@ void kthread_flush_worker(struct kthread_worker *worker)
> }
> EXPORT_SYMBOL_GPL(kthread_flush_worker);
>
> +/**
> + * kthread_flush_delayed_works - flush all current delayed works on a
> + * kthread_worker.
> + * @worker: worker to flush
> + *
> + * Wait until all currently executing or pending delayed works are
> + * queued completed.

It is not clear to me what "queued completed" means. I am not a native
speaker but this does not look like a meaningful English.

My understanding is that the function queues all pending delayed
work items immediately. It does not wait until they are proceed.

I am not sure if this is a correct behavior. The delayed work
items are often used for periodic events. The work usually
queues itself to be proceed later again. In this case,
it rather should get canceled.

Another problem is that the kthread_worker API tries to behave
the same way as the classic workqueue API. And destroy_workqueue()
ignores delayed works. Well, it might be an implementation limitation
because the classic workqueues do not have list of delayed work
items.

A compromise might be to just add the warning in
kthread_destroy_worker() when worker->delayed_work_list is not empty.
The caller will be responsible for queuing or canceling all delayed
work items before calling kthread_destroy_worker().

We should also document this expectation in the description
of kthread_destroy_worker().

> + */
> +void kthread_flush_delayed_works(struct kthread_worker *worker)
> +{
> + unsigned long flags;
> + struct kthread_delayed_work *dwork;
> + struct kthread_work *work;
> +
> + raw_spin_lock_irqsave(&worker->lock, flags);
> + while (!list_empty(&worker->delayed_work_list)) {
> + work = list_first_entry(&worker->delayed_work_list,
> + struct kthread_work, node);
> + list_del_init(&work->node);

This is racy. It removes the work from the list before deleting
the timer. kthread_delayed_work_timer_fn() would warn when
it gets triggered in the meantime.

A safe solution would be to delete the timer using
kthread_cancel_delayed_work_timer(). It would make sure
that the work will not be queued when timer is proceed
in the meantime.

> + dwork = container_of(work, struct kthread_delayed_work, work);
> + raw_spin_unlock_irqrestore(&worker->lock, flags);
> + if (del_timer_sync(&dwork->timer))
> + kthread_queue_work(worker, &dwork->work);
> + raw_spin_lock_irqsave(&worker->lock, flags);
> + }
> + raw_spin_unlock_irqrestore(&worker->lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(kthread_flush_delayed_works);
> +
> /**
> * kthread_destroy_worker - destroy a kthread worker
> * @worker: worker to be destroyed
> @@ -1391,8 +1420,10 @@ void kthread_destroy_worker(struct kthread_worker *worker)
> if (WARN_ON(!task))
> return;
>
> + kthread_flush_delayed_works(worker);
> kthread_flush_worker(worker);
> kthread_stop(task);
> + WARN_ON(!list_empty(&worker->delayed_work_list));

I would personally just add this warning.

Also we should update the function description. The caller is
responsible for handling delayed work items. They should either
be queued immediately or canceled when this function
is called.

> WARN_ON(!list_empty(&worker->work_list));
> kfree(worker);

Best Regards,
Petr