Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

From: Juri Lelli
Date: Tue Dec 12 2023 - 04:56:15 EST


Hello,

Thanks for the quick reply!

On 11/12/23 08:39, Tejun Heo wrote:
> Hello,
>
> On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote:
> > Guess this is a requirement because, if workqueue processing is stuck
> > for some reason, getting rescuers to run on the same set of cpus
> > workqueues have been restricted to already doesn't really have good
> > chances of making any progress?
>
> The only problem rescuers try to solve is deadlocks caused by lack of
> memory, so on the cpu side, it just follows whatever worker pool it's trying
> to help.
>
> > Wonder if we still might need some sort of fail hard/warn mode in case
> > strict isolation is in place? Or maybe we have that already?
>
> For both percpu and unbound workqueues, the rescuers just follow whatever
> pool it's trying to help at the moment, so it shouldn't cause any surprises
> in terms of isolation. It just temporarily joins the already active but
> stuck pool.

Hummm, OK, but in terms of which CPU the rescuer is possibly woken up,
how are we making sure that the wake up is always happening on
housekeeping CPUs (assuming unbound workqueues have been restricted to
those)?

AFAICS, we have

send_mayday ->
wake_up_process(wq->rescuer->task)

which is not affined to the workqueue cpumask it's called to rescue, so
in theory can be woken up anywhere?

Thanks,
Juri