Re: [PATCH v3 3/6] sched: Change wait_task_inactive()s match_state

From: Peter Zijlstra
Date: Tue Sep 06 2022 - 06:55:24 EST


On Sun, Sep 04, 2022 at 12:44:36PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > Make wait_task_inactive()'s @match_state work like ttwu()'s @state.
> >
> > That is, instead of an equal comparison, use it as a mask. This allows
> > matching multiple block conditions.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/core.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct
> > * is actually now running somewhere else!
> > */
> > while (task_running(rq, p)) {
> > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> > + if (match_state && !(READ_ONCE(p->__state) & match_state))
> > return 0;
>
> We lose the unlikely annotation there - but I guess it probably never
> really mattered anyway?

So any wait_task_inactive() caller does want that case to be true, but
the whole match_state precondition mostly wrecks things anyway. If
anything it should've been:

if (likely(match_state && !(READ_ONCE(p->__state) & match_state)))
return 0;

but I can't find it in me to care too much here.

> Suggestion #1:
>
> - Shouldn't we rename task_running() to something like task_on_cpu()? The
> task_running() primitive is similar to TASK_RUNNING but is not based off
> any TASK_FLAGS.

That looks like a simple enough patch, lemme go do that.

> Suggestion #2:
>
> - Shouldn't we eventually standardize on task->on_cpu on UP kernels too?
> They don't really matter anymore, and doing so removes #ifdefs and makes
> the code easier to read.

Probably, but that sounds like something that'll spiral out of control
real quick, so I'll leave that on the TODO list somewhere.

> > cpu_relax();
> > }
> > @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct
> > running = task_running(rq, p);
> > queued = task_on_rq_queued(p);
> > ncsw = 0;
> > - if (!match_state || READ_ONCE(p->__state) == match_state)
> > + if (!match_state || (READ_ONCE(p->__state) & match_state))
> > ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
> > task_rq_unlock(rq, p, &rf);
>
> Suggestion #3:
>
> - Couldn't the following users with a 0 mask:
>
> drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0);
> fs/coredump.c: wait_task_inactive(ptr->task, 0);
>
> Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the
> !match_state special case?
>
> They'd do something like:
>
> drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY);
> fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY);
>
> It's not an entirely 100% equivalent transformation though, but looks OK
> at first sight: ->__state will be some nonzero mask for genuine tasks
> waiting to schedule out, so any match will be functionally the same as a
> 0 flag telling us not to check any of the bits, right? I might be missing
> something though.

I too am thinking that should work. Added patch for that.