Re: [PATCH] workqueue: fix invalid cpu in kick_pool

From: zhuangel570
Date: Mon Nov 20 2023 - 23:02:29 EST


Thanks, I have uploaded my configuration and console logs to the following
links, please check.

https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/console.log
https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/config-6.7.rc1
https://raw.githubusercontent.com/zhuangel/misc/main/debug/workqueue/config-4.18.0-348.el8.x86_64

The issue was first discovered in my BM machine and for ease of debugging,
I ran a virtual machine of the same case and reproduced it. My test virtual
machine was installed from centos 8.5.2111 DVD (origin kernel is 4.18.0-348)
and then the kernel was updated from the 6.7.rc1 source code. The virtual
machine ran on 4 CPU, 8G memory and some virtio devices.

My investigation show, when "workqueue.unbound_cpus" and "isolcpus" are
configured as same cpuset, this will make the "wq_unbound_cpumask" as an
empty set, when some idle work task try to set "wake_cpu" from
"cpumask_any_distribute", an invalid CPU will be set, then may trigger
panic.

To be honestly, I am not really known why there is a "not-present page"
exception, after I remove "workqueue.unbound_cpus" from command line or
apply this patch to the running kernel, the system could boot successfully.

On Tue, Nov 21, 2023 at 3:07 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Mon, Nov 20, 2023 at 08:16:23PM +0800, Yong He wrote:
> > With incorrect unbound workqueue configurations, this may introduce kernel
> > panic, because cpumask_any_distribute() will not always return a valid cpu,
> > such as one set the 'isolcpus' and 'workqueue.unbound_cpus' into the same
> > cpuset, and this will make the @pool->attrs->__pod_cpumask an empty set,
> > then trigger panic like this:
>
> This shouldn't have happened. Can you share the configuration and the full
> dmesg? Let's fix the problem at the source.
>
> Thanks.
>
> --
> tejun



--
——————————
zhuangel570
——————————