Re: [PATCH 2/2] sched: Consider task_struct::saved_state in wait_task_inactive().

From: Valentin Schneider
Date: Mon Jul 25 2022 - 13:48:15 EST


On 20/07/22 17:44, Sebastian Andrzej Siewior wrote:
> Ptrace is using wait_task_inactive() to wait for the tracee to reach a
> certain task state. On PREEMPT_RT that state may be stored in
> task_struct::saved_state while the tracee blocks on a sleeping lock and
> task_struct::__state is set to TASK_RTLOCK_WAIT.
> It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
> is blocked on a sleeping lock because during wake up (after the sleeping lock
> has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
> and acquired the pi_lock it will reset the state accordingly but until then
> TASK_RUNNING will be observed (with the desired state saved in saved_state).
>
> Check also for task_struct::saved_state if the desired match was not found in
> task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
> until the task is idle and state is visible in task_struct::__state.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

I have a suggestion and a comment below, but other than that this looks OK.

Reviewed-by: Valentin Schneider <vschneid@xxxxxxxxxx>

> ---
> kernel/sched/core.c | 46 +++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 41 insertions(+), 5 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3257,6 +3257,40 @@ int migrate_swap(struct task_struct *cur
> }
> #endif /* CONFIG_NUMA_BALANCING */
>
> +#ifdef CONFIG_PREEMPT_RT

Would something like the below be useful?

/*
* If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
* rtlock *before* voluntarily calling into schedule() after setting its state
* to X. For things like ptrace (X=TASK_TRACED), the task could have more work
* to do upon acquiring the lock before whoever called wait_task_inactive()
* should return. IOW, we have to wait for:
*
* p.saved_state = TASK_RUNNING
* p.__state = X
*
* which implies the task isn't blocked on an RT lock and got to schedule() by
* itself.
*
* Also see comments in ttwu_state_match().
*/

> +static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
> +{
> + unsigned long flags;
> + bool mismatch;
> +
> + raw_spin_lock_irqsave(&p->pi_lock, flags);
> + mismatch = READ_ONCE(p->__state) != match_state &&
> + READ_ONCE(p->saved_state) != match_state;
> + raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + return mismatch;
> +}
> +static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
> + bool *wait)
> +{
> + if (READ_ONCE(p->__state) == match_state)
> + return true;
> + if (READ_ONCE(p->saved_state) != match_state)
> + return false;
> + *wait = true;
> + return true;
> +}
> +#else
> +static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
> +{
> + return READ_ONCE(p->__state) != match_state;
> +}
> +static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
> + bool *wait)
> +{
> + return READ_ONCE(p->__state) == match_state;
> +}
> +#endif
> +
> /*
> * wait_task_inactive - wait for a thread to unschedule.
> *
> @@ -3346,7 +3382,7 @@ unsigned long wait_task_inactive(struct
> * running right now), it's preempted, and we should
> * yield - it could be a while.
> */
> - if (unlikely(queued)) {
> + if (unlikely(wait)) {

We could be repeatedly doing this for as long as the task is blocked on the
rtlock, but IIUC that's the same story on !PREEMPT_RT if it's just a queued
task preempted by a higher prio task, it may take a while for it to
schedule() and dequeue...

> ktime_t to = NSEC_PER_SEC / HZ;
>
> set_current_state(TASK_UNINTERRUPTIBLE);