Re: [PATCH 4/6] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE

From: Peter Zijlstra
Date: Wed May 10 2023 - 11:10:18 EST


On Tue, May 09, 2023 at 05:07:50PM -1000, Tejun Heo wrote:

> @@ -976,24 +981,54 @@ void notrace wq_worker_stopping(struct task_struct *task, bool voluntary)
>
> pool = worker->pool;
>
> - /* Return if preempted before wq_worker_running() was reached */
> - if (worker->sleeping)
> - return;
> + if (!voluntary || task_is_running(task)) {
> + /*
> + * Concurrency-managed @worker is still RUNNING. See if the
> + * current work is hogging CPU stalling other per-cpu work
> + * items. If so, mark @worker CPU_INTENSIVE to exclude it from
> + * concurrency management. @worker->current_* are stable as they
> + * can only be modified by @task which is %current.
> + */
> + if (!worker->current_work ||
> + task->se.sum_exec_runtime - worker->current_at <
> + wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
> + return;
>

> @@ -2348,6 +2382,7 @@ __acquires(&pool->lock)
> worker->current_work = work;
> worker->current_func = work->func;
> worker->current_pwq = pwq;
> + worker->current_at = worker->task->se.sum_exec_runtime;

That only gets updated at scheduling events, it's not a 'running' clock.

> work_data = *work_data_bits(work);
> worker->current_color = get_work_color(work_data);
>


Anyway, it occurs to me that if all you want is to measure long running
works, then would it not be much easier to simply forward the tick?

Something like the below.. Then this tick handler (which will have just
updated ->sum_exec_runtime BTW) can do that above 'work-be-long-running'
check.

Or am I missing something? Seems simpler than hijacking preempt-out.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 944c3ae39861..3484cada9a4a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5632,6 +5632,9 @@ void scheduler_tick(void)

perf_event_task_tick();

+ if (curr->flags & PF_WQ_WORKER)
+ wq_worker_tick(curr);
+
#ifdef CONFIG_SMP
rq->idle_balance = idle_cpu(cpu);
trigger_load_balance(rq);