[PATCH -tip V4 1/8] workqueue: split cpuhotplug callbacks for unbound workqueue

From: Lai Jiangshan
Date: Mon Jan 11 2021 - 09:27:09 EST


From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

Unbound workers are normally non-per-cpu-kthread, but when cpu hotplug,
we also need to update the pools of unbound workqueues based on the info
that whether the relevant node has CPUs online or not for every workqueue.

The code reuses current cpu hotplug callbacks which are designed for
per-cpu workqueues and not well fit with unbound workqueues/pool/workers.

For example workqueue_offline_cpu() is very late, work items of unbound
workqueue might delay offline process or even worse it might cause
offline stopped due to back-to-back work items which are not really
needed to be per-cpu.

And it is also very bad when unbound worker are created after
sched_cpu_deactivate(). set_cpus_allowed_ptr() with online&!active
cpumasks (multi CPUs) will cause warning, and no one will deactivate
such late spawned workers and might cause later BUG_ON().

Similarly, workqueue_online_cpu is verly early, work items of unbound
workqueue might delay online process. And it is also very bad when
unbound worker are created before sched_cpu_activate().
set_cpus_allowed_ptr() with online&!active cpumasks (multi CPUs) will
cause warning. For example, the commit d945b5e9f0e("workqueue: Fix
setting affinity of unbound worker threads") fixed it in some cases
of the problem, leaving other cases unfixed and leaving the comment
does not match with the fixing code.

So we need to split cpuhotplug callback for unbound workqueue and
put the new cpuhotplug callbacks in proper places.

Normally, we can split them and put them to CPUHP_AP_ONLINE_DYN. But it
doesn't solve the problem of set_cpus_allowed_ptr() with online&!active
cpumasks. So we have to use an offline callback earlier than
sched_cpu_deactivate() and an online callbck later than sched_cpu_activate().

This patch just introduces CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE and
splits the callbacks. The follow-up fixes are in the later patches.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
---
include/linux/cpuhotplug.h | 4 ++++
include/linux/workqueue.h | 2 ++
kernel/cpu.c | 5 +++++
kernel/workqueue.c | 36 ++++++++++++++++++++++++++----------
4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 0042ef362511..ac2103deb20b 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,9 @@
* | ^
* v |
* AP_ACTIVE AP_ACTIVE
+ * | ^
+ * v |
+ * ONLINE ONLINE
*/

enum cpuhp_state {
@@ -194,6 +197,7 @@ enum cpuhp_state {
CPUHP_AP_X86_HPET_ONLINE,
CPUHP_AP_X86_KVM_CLK_ONLINE,
CPUHP_AP_ACTIVE,
+ CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
CPUHP_ONLINE,
};

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 26de0cae2a0a..98300ddee308 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -663,6 +663,8 @@ static inline void wq_watchdog_touch(int cpu) { }
int workqueue_prepare_cpu(unsigned int cpu);
int workqueue_online_cpu(unsigned int cpu);
int workqueue_offline_cpu(unsigned int cpu);
+int workqueue_unbound_online_cpu(unsigned int cpu);
+int workqueue_unbound_offline_cpu(unsigned int cpu);
#endif

void __init workqueue_init_early(void);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 4e11e91010e1..f654ca0a104e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1665,6 +1665,11 @@ static struct cpuhp_step cpuhp_hp_states[] = {
.startup.single = sched_cpu_activate,
.teardown.single = sched_cpu_deactivate,
},
+ [CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE] = {
+ .name = "workqueue_unbound:online",
+ .startup.single = workqueue_unbound_online_cpu,
+ .teardown.single = workqueue_unbound_offline_cpu,
+ },
#endif

/* CPU is fully up and running. */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9880b6c0e272..d7bdb7885e55 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5060,6 +5060,29 @@ int workqueue_prepare_cpu(unsigned int cpu)
}

int workqueue_online_cpu(unsigned int cpu)
+{
+ struct worker_pool *pool;
+
+ for_each_cpu_worker_pool(pool, cpu) {
+ mutex_lock(&wq_pool_attach_mutex);
+ rebind_workers(pool);
+ mutex_unlock(&wq_pool_attach_mutex);
+ }
+
+ return 0;
+}
+
+int workqueue_offline_cpu(unsigned int cpu)
+{
+ /* unbinding per-cpu workers should happen on the local CPU */
+ if (WARN_ON(cpu != smp_processor_id()))
+ return -1;
+
+ unbind_workers(cpu);
+ return 0;
+}
+
+int workqueue_unbound_online_cpu(unsigned int cpu)
{
struct worker_pool *pool;
struct workqueue_struct *wq;
@@ -5067,12 +5090,11 @@ int workqueue_online_cpu(unsigned int cpu)

mutex_lock(&wq_pool_mutex);

+ /* update CPU affinity of workers of unbound pools */
for_each_pool(pool, pi) {
mutex_lock(&wq_pool_attach_mutex);

- if (pool->cpu == cpu)
- rebind_workers(pool);
- else if (pool->cpu < 0)
+ if (pool->cpu < 0)
restore_unbound_workers_cpumask(pool, cpu);

mutex_unlock(&wq_pool_attach_mutex);
@@ -5086,16 +5108,10 @@ int workqueue_online_cpu(unsigned int cpu)
return 0;
}

-int workqueue_offline_cpu(unsigned int cpu)
+int workqueue_unbound_offline_cpu(unsigned int cpu)
{
struct workqueue_struct *wq;

- /* unbinding per-cpu workers should happen on the local CPU */
- if (WARN_ON(cpu != smp_processor_id()))
- return -1;
-
- unbind_workers(cpu);
-
/* update NUMA affinity of unbound workqueues */
mutex_lock(&wq_pool_mutex);
list_for_each_entry(wq, &workqueues, list)
--
2.19.1.6.gb485710b