Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

From: Peter Zijlstra
Date: Mon Aug 27 2018 - 07:15:11 EST


On Fri, Aug 24, 2018 at 02:24:48PM -0700, Steve Muckle wrote:
> On 08/24/2018 02:47 AM, Peter Zijlstra wrote:
> > > > On 08/17/2018 11:27 AM, Steve Muckle wrote:
> >
> > > > > When rt_mutex_setprio changes a task's scheduling class to RT,
> > > > > we're seeing cases where the task's vruntime is not updated
> > > > > correctly upon return to the fair class.
> >
> > > > > Specifically, the following is being observed:
> > > > > - task is deactivated while still in the fair class
> > > > > - task is boosted to RT via rt_mutex_setprio, which changes
> > > > > the task to RT and calls check_class_changed.
> > > > > - check_class_changed leads to detach_task_cfs_rq, at which point
> > > > > the vruntime_normalized check sees that the task's state is TASK_WAKING,
> > > > > which results in skipping the subtraction of the rq's min_vruntime
> > > > > from the task's vruntime
> > > > > - later, when the prio is deboosted and the task is moved back
> > > > > to the fair class, the fair rq's min_vruntime is added to
> > > > > the task's vruntime, even though it wasn't subtracted earlier.
> >
> > I'm thinking that is an incomplete scenario; where do we get to
> > TASK_WAKING.
>
> Yes there's a missing bit of context here at the beginning that the task to
> be boosted had already been put into TASK_WAKING.

See, I'm confused...

The only time TASK_WAKING is visible, is if we've done a remote wakeup
and it's 'stuck' on the remote wake_list. And in that case we've done
migrate_task_rq_fair() on it.

So by the time either rt_mutex_setprio() or __sched_setscheduler() get
to calling check_class_changed(), under both pi_lock and rq->lock, the
vruntime_normalized() thing should be right.

So please detail the exact scenario. Because I'm not seeing it.