Re: [RFC PATCH 3/3] workqueue: Enable unbound cpumask update on ordered workqueues

From: Waiman Long
Date: Wed Jan 31 2024 - 12:02:50 EST


On 1/31/24 12:00, Tejun Heo wrote:
Hello,

On Tue, Jan 30, 2024 at 01:33:36PM -0500, Waiman Long wrote:
+/* requeue the work items stored in wq->o_list */
+static void requeue_ordered_works(struct workqueue_struct *wq)
+{
+ LIST_HEAD(head);
+ struct work_struct *work, *next;
+
+ raw_spin_lock_irq(&wq->o_lock);
+ if (list_empty(&wq->o_list))
+ goto unlock_out; /* No requeuing is needed */
+
+ list_splice_init(&wq->o_list, &head);
+ raw_spin_unlock_irq(&wq->o_lock);
+
+ /*
+ * Requeue the first batch of work items. Since it may take a while
+ * to drain the old pwq and update the workqueue attributes, there
+ * may be a rather long list of work items to process. So we allow
+ * queue_work() callers to continue putting their work items in o_list.
+ */
+ list_for_each_entry_safe(work, next, &head, entry) {
+ list_del_init(&work->entry);
+ local_irq_disable();
+ __queue_work_rcu_locked(WORK_CPU_UNBOUND, wq, work);
+ local_irq_enable();
+ }
+
+ /*
+ * Now check if there are more work items queued, if so set ORD_WAIT
+ * and force incoming queue_work() callers to busy wait until the 2nd
+ * batch of work items have been properly requeued. It is assumed
+ * that the 2nd batch should be much smaller.
+ */
+ raw_spin_lock_irq(&wq->o_lock);
+ if (list_empty(&wq->o_list))
+ goto unlock_out;
+ WRITE_ONCE(wq->o_state, ORD_WAIT);
+ list_splice_init(&wq->o_list, &head);
+ raw_spin_unlock(&wq->o_lock); /* Leave interrupt disabled */
+ list_for_each_entry_safe(work, next, &head, entry) {
+ list_del_init(&work->entry);
+ __queue_work_rcu_locked(WORK_CPU_UNBOUND, wq, work);
+ }
+ WRITE_ONCE(wq->o_state, ORD_NORMAL);
+ local_irq_enable();
+ return;
+
+unlock_out:
+ WRITE_ONCE(wq->o_state, ORD_NORMAL);
+ raw_spin_unlock_irq(&wq->o_lock);
+}
I'm not a big fan of this approach. It's a rather big departure from how
things are usually done in workqueue. I'd much prefer sth like the
following:

- Add the ability to mark an unbound pwq plugged. If plugged,
pwq_tryinc_nr_active() always fails.

- When cpumasks need updating, set max_active of all ordered workqueues to
zero and flush them. Note that if you set all max_actives to zero (note
that this can be another "plug" flag on the workqueue) first, all the
ordered workqueues would already be draining, so calling flush_workqueue()
on them sequentially shouldn't take too long.

- Do the normal pwq allocation and linking but make sure that all new
ordered pwqs start plugged.

- When update is done, restore the max_actives on all ordered workqueues.

- New work items will now get queued to the newest dfl_pwq which is plugged
and we know that wq->pwqs list contain pwqs in reverse creation order. So,
from pwq_release_workfn(), if the pwq being released is for an ordered
workqueue and not plugged, unplug the pwq right in front.

This hopefully should be less invasive.

Thanks.

Thanks for suggestion. I will rework the patch series to use this approach.

Cheers,
Longman