Re: [PATCH] workqueue: Restore cpus_allowed mask for sleepingworkqueue rescue threads

From: Ripduman Sohan
Date: Thu Sep 15 2011 - 12:14:39 EST


Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello,
>
> On Thu, Sep 01, 2011 at 02:36:33PM +0100, Ripduman Sohan wrote:
> > Rescuer threads may be migrated (and are bound) to particular CPUs when
> > active. However, the allowed_cpus mask is not restored when they return
> > to sleep rendering inconsistent the presented and actual set of CPUs the
> > process may potentially run on. This patch fixes this oversight by
> > recording the allowed_cpus mask for rescuer threads when it enters the
> > rescuer_thread() main loop and restoring it every time the thread sleeps.
>
> Hmmm... so, currently, rescuer is left bound to the last cpu it worked
> on. Why is this a problem?
>
> Thanks.
>
> --
> tejun

Hi,

The rescuer being left bound to the last CPU it was active on is not a
problem. As I pointed out in the commit log the issue is that the
allowed_cpus mask is not restored when rescuers return to sleep,
rendering inconsistent the presented and actual set of CPUs the
process may potentially run on.

Perhaps an explanation is in order. I am working on a system where we
constantly sample process run-state (including the process
Cpus_Allowed field in /proc/<pid>/status) to build a forward plan of
where the process _may_ run in the future. In situations of high
memory pressue (common on our setup) where the rescuers ran often the
plan begun to significantly deviate from the calculated schedule
because rescuer threads were marked as only runnable on a single CPU
when in reality they would bounce across CPUs.

I've currently put in a special-case exception in our code to account
for the fact that rescuer threads may run on _any_ CPU regardless of
the current cpus_allowed mask but I thought it would be useful to
correct it. I'm happy to continue with my current approach if you
deem the patch irrelevant.

Kind regards,

--rip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/