Re: [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU

From: Dave Chinner
Date: Wed Jan 29 2020 - 17:00:30 EST


On Wed, Jan 29, 2020 at 06:38:52PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2020 at 09:10:12AM +0000, Mel Gorman wrote:
> > Peter, Ingo and Vincent -- I know the timing is bad due to the merge
> > window but do you have any thoughts on allowing select_idle_sibling to
> > stack a wakee task on the same CPU as a waker in this specific case?
>
> I sort of see, but *groan*...
>
> so if the kworker unlocks a contended mutex/rwsem/completion...
>
> I suppose the fact that it limits it to tasks that were running on the
> same CPU limits the impact if we do get it wrong.
>
> Elsewhere you write:
>
> > I would prefer the wakeup code did not have to signal that it's a
> > synchronous wakeup. Sync wakeups so exist but callers got it wrong many
> > times where stacking was allowed and then the waker did not go to sleep.
> > While the chain of events are related, they are not related in a very
> > obvious way. I think it's much safer to keep this as a scheduler
> > heuristic instead of depending on callers to have sufficient knowledge
> > of the scheduler implementation.
>
> That is true; the existing WF_SYNC has caused many issues for maybe
> being too strong.
>
> But what if we create a new hint that combines both these ideas? Say
> WF_COMPLETE and subject that to these same criteria. This way we can
> eliminate wakeups from locks and such (they won't have this set).
>
> Or am I just making things complicated again?

I suspect this is making it complicated again, because it requires
the people who maintain the code that is using workqueues to
understand when they might need to use a special wakeup interface in
the work function. And that includes code that currently calls
wake_up_all() because there can be hundreds of independent tasks
waiting on the IO completion (e.g all the wait queues in the XFS
journal code can (and do) have multiple threads waiting on them).

IOWs, requiring a special flag just to optimise this specific case
(i.e. single dependent waiter on same CPU as the kworker) when the
adverse behaviour is both hardware and workload dependent means it
just won't get used correctly or reliably.

Hence I'd much prefer the kernel detects and dynamically handles
this situation at runtime, because this pattern of workqueue usage
is already quite common and will only become more widespread as we
progress towards async processing of syscalls.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx