Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

From: K Prateek Nayak
Date: Thu Jun 08 2023 - 23:43:40 EST


Hello Tejun,

On 6/9/2023 4:20 AM, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 08, 2023 at 08:31:34AM +0530, K Prateek Nayak wrote:
>> [..snip..]
>> o I consistently see a WARN_ON_ONCE() in kick_pool() being hit when I
>> run "sudo ./stress-ng --iomix 96 --timeout 1m". I've seen few
>> different stack traces so far. Including all below just in case:
> ...
>> This is the same WARN_ON_ONCE() you had added in the HEAD commit:
>>
>> $ scripts/faddr2line vmlinux kick_pool+0xdb
>> kick_pool+0xdb/0xe0:
>> kick_pool at kernel/workqueue.c:1130 (discriminator 1)
>>
>> $ sed -n 1130,1132p kernel/workqueue.c
>> if (!WARN_ON_ONCE(wake_cpu >= nr_cpu_ids))
>> p->wake_cpu = wake_cpu;
>> get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
>>
>> Let me know if you need any more data from my test setup.
>> P.S. The kernel is still up and running (~30min) despite hitting this
>> WARN_ON_ONCE() in my case :)
>
> Okay, that was me being stupid and not initializing the new fields for
> per-cpu workqueues. Can you please test the following branch? It should have
> both bugs fixed properly.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2

I've not run into any panics or warnings with this one. Kernel has been
stable for ~30min while running stress-ng iomix. We'll resume the testing
with v2 :)

>
> If that doesn't crash, I'd love to hear how it affects the perf regressions
> reported over that past few months.>
> Thanks.
>

--
Thanks and Regards,
Prateek