Re: [PATCH RFC v2 tip/core/rcu 01/22] sched/core: Add function to sample state of locked-down task

From: Paul E. McKenney
Date: Thu Mar 19 2020 - 13:35:28 EST


On Thu, Mar 19, 2020 at 01:22:38PM -0400, Steven Rostedt wrote:
> On Wed, 18 Mar 2020 17:10:39 -0700
> paulmck@xxxxxxxxxx wrote:
>
> > From: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> >
> > A running task's state can be sampled in a consistent manner (for example,
> > for diagnostic purposes) simply by invoking smp_call_function_single()
> > on its CPU, which may be obtained using task_cpu(), then having the
> > IPI handler verify that the desired task is in fact still running.
> > However, if the task is not running, this sampling can in theory be done
> > immediately and directly. In practice, the task might start running at
> > any time, including during the sampling period. Gaining a consistent
> > sample of a not-running task therefore requires that something be done
> > to lock down the target task's state.
> >
> > This commit therefore adds a try_invoke_on_locked_down_task() function
> > that invokes a specified function if the specified task can be locked
> > down, returning true if successful and if the specified function returns
> > true. Otherwise this function simply returns false. Given that the
> > function passed to try_invoke_on_nonrunning_task() might be invoked with
> > a runqueue lock held, that function had better be quite lightweight.
> >
> > The function is passed the target task's task_struct pointer and the
> > argument passed to try_invoke_on_locked_down_task(), allowing easy access
> > to task state and to a location for further variables to be passed in
> > and out.
> >
> > Note that the specified function will be called even if the specified
> > task is currently running. The function can use ->on_rq and task_curr()
> > to quickly and easily determine the task's state, and can return false
> > if this state is not to the function's liking. The caller of teh
>
> s/teh/the/

Good eyes, will fix!

> > try_invoke_on_locked_down_task() would then see the false return value,
> > and could take appropriate action, for example, trying again later or
> > sending an IPI if matters are more urgent.
> >
> > It is expected that use cases such as the RCU CPU stall warning code will
> > simply return false if the task is currently running. However, there are
> > use cases involving nohz_full CPUs where the specified function might
> > instead fall back to an alternative sampling scheme that relies on heavier
> > synchronization (such as memory barriers) in the target task.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ]
> > [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ]
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Ben Segall <bsegall@xxxxxxxxxx>
> > Cc: Mel Gorman <mgorman@xxxxxxx>
> > ---
> > include/linux/wait.h | 2 ++
> > kernel/sched/core.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 50 insertions(+)
> >
> > diff --git a/include/linux/wait.h b/include/linux/wait.h
> > index 3283c8d..e2bb8ed 100644
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -1148,4 +1148,6 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
> > (wait)->flags = 0; \
> > } while (0)
> >
> > +bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg);
> > +
> > #endif /* _LINUX_WAIT_H */
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index fc1dfc0..195eba0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2580,6 +2580,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > *
> > * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> > * __schedule(). See the comment for smp_mb__after_spinlock().
> > + *
> > + * A similar smb_rmb() lives in try_invoke_on_locked_down_task().
> > */
> > smp_rmb();
> > if (p->on_rq && ttwu_remote(p, wake_flags))
> > @@ -2654,6 +2656,52 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > }
> >
> > /**
> > + * try_invoke_on_locked_down_task - Invoke a function on task in fixed state
> > + * @p: Process for which the function is to be invoked.
> > + * @func: Function to invoke.
> > + * @arg: Argument to function.
> > + *
> > + * If the specified task can be quickly locked into a definite state
> > + * (either sleeping or on a given runqueue), arrange to keep it in that
> > + * state while invoking @func(@arg). This function can use ->on_rq and
> > + * task_curr() to work out what the state is, if required. Given that
> > + * @func can be invoked with a runqueue lock held, it had better be quite
> > + * lightweight.
> > + *
> > + * Returns:
> > + * @false if the task slipped out from under the locks.
> > + * @true if the task was locked onto a runqueue or is sleeping.
> > + * However, @func can override this by returning @false.
>
> Should probably state that it will return false if the state could be
> locked, otherwise it returns the return code of the function.

So like this?

* Returns:
* @false if the task state could not be locked.
* Otherwise, the return value from @func(arg).

> I'm wondering if we shouldn't have the function return code be something
> passed in by the parameter, and have this return either true (locked and
> function called), or false (not locked and function wasn't called).

I was thinking of this as one of the possible uses of whatever arg
points to, which allows the caller of try_invoke_on_locked_down_task()
and the specified function to communicate whatever they wish. Then
the specified function could (for example) unconditionally return true
so that the return value from try_invoke_on_locked_down_task() indicated
whether or not the specified function was called.

The current setup is very convenient for the use cases thus far. It
allows the function to say "Yeah, I was called, but I couldn't do
anything", thus allowing the caller to make exactly one check to know
that corrective action is required.

> > + */
> > +bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
> > +{
> > + bool ret = false;
> > + struct rq_flags rf;
> > + struct rq *rq;
> > +
> > + lockdep_assert_irqs_enabled();
> > + raw_spin_lock_irq(&p->pi_lock);
> > + if (p->on_rq) {
> > + rq = __task_rq_lock(p, &rf);
> > + if (task_rq(p) == rq)
> > + ret = func(p, arg);
> > + rq_unlock(rq, &rf);
> > + } else {
> > + switch (p->state) {
> > + case TASK_RUNNING:
> > + case TASK_WAKING:
> > + break;
> > + default:
>
> Don't we need a comment here about why we have a rmb() and where the
> matching wmb() is?

Indeed we do! I will fix that as well.

Thanx, Paul

> -- Steve
>
> > + smp_rmb();
> > + if (!p->on_rq)
> > + ret = func(p, arg);
> > + }
> > + }
> > + raw_spin_unlock_irq(&p->pi_lock);
> > + return ret;
> > +}
> > +
> > +/**
> > * wake_up_process - Wake up a specific process
> > * @p: The process to be woken up.
> > *
>