Re: [PATCH v4 09/10] workqueue: Implement system-wide nr_active enforcement for unbound workqueues

From: Tejun Heo
Date: Tue Jan 30 2024 - 23:03:02 EST


Hello,

Thanks for the report. Can you please test whether the following patch fixes
the problem?

----- 8< -----
From: Tejun Heo <tj@xxxxxxxxxx>
Subject: workqueue: Fix crash due to premature NUMA topology access on some archs

System workqueues are allocated early during boot from
workqueue_init_early(). While allocating unbound workqueues,
wq_update_node_max_active() is invoked from apply_workqueue_attrs() and
accesses NUMA topology information - cpumask_of_node() and cpu_to_node().

At this point, topology information is not initialized yet and on arm and
some other archs, it leads to an oops like the following:

Unable to handle kernel paging request at virtual address ffff0002100296e0
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000255a000
[ffff0002100296e0] pgd=18000001ffff7003, p4d=18000001ffff7003,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240130+ #14392
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 600000c9 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : wq_update_node_max_active+0x50/0x1fc
lr : wq_update_node_max_active+0x1f0/0x1fc
...
Call trace:
wq_update_node_max_active+0x50/0x1fc
apply_wqattrs_commit+0xf0/0x114
apply_workqueue_attrs_locked+0x58/0xa0
alloc_workqueue+0x5ac/0x774
workqueue_init_early+0x460/0x540
start_kernel+0x258/0x684
__primary_switched+0xb8/0xc0
Code: 9100a273 35000d01 53067f00 d0016dc1 (f8607a60)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Fix it by initializing wq->node_nr_active[].max to WQ_DFL_MIN_ACTIVE on
allocation and making wq_update_node_max_active() noop until
workqueue_init_topology(). Note that workqueue_init_topology() invokes
wq_update_node_max_active() on all unbound workqueues, so the end result is
still the same.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Reported-by: Nathan Chancellor <nathan@xxxxxxxxxx>
Link: http://lkml.kernel.org/r/91eacde0-df99-4d5c-a980-91046f66e612@xxxxxxxxxxx
Fixes: 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
---
kernel/workqueue.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9221a4c57ae1..a65081ec6780 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -386,6 +386,8 @@ static const char *wq_affn_names[WQ_AFFN_NR_TYPES] = {
[WQ_AFFN_SYSTEM] = "system",
};

+static bool wq_topo_initialized = false;
+
/*
* Per-cpu work items which run for longer than the following threshold are
* automatically considered CPU intensive and excluded from concurrency
@@ -1510,6 +1512,9 @@ static void wq_update_node_max_active(struct workqueue_struct *wq, int off_cpu)

lockdep_assert_held(&wq->mutex);

+ if (!wq_topo_initialized)
+ return;
+
if (!cpumask_test_cpu(off_cpu, effective))
off_cpu = -1;

@@ -4356,6 +4361,7 @@ static void free_node_nr_active(struct wq_node_nr_active **nna_ar)

static void init_node_nr_active(struct wq_node_nr_active *nna)
{
+ nna->max = WQ_DFL_MIN_ACTIVE;
atomic_set(&nna->nr, 0);
raw_spin_lock_init(&nna->lock);
INIT_LIST_HEAD(&nna->pending_pwqs);
@@ -7400,6 +7406,8 @@ void __init workqueue_init_topology(void)
init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);
init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa);

+ wq_topo_initialized = true;
+
mutex_lock(&wq_pool_mutex);

/*