Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative

From: Rik van Riel
Date: Mon Apr 04 2016 - 15:36:21 EST


On Mon, 2016-04-04 at 15:31 -0400, Chris Metcalf wrote:
> On 4/4/2016 3:12 PM, Rik van Riel wrote:
> >
> > On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote:
> > >
> > > On arm64, when calling enqueue_task_fair() from
> > > migration_cpu_stop(),
> > > we find the nr_running value updated by add_nr_running(), but the
> > > cfs.nr_running value has not always yet been
> > > updated.ÂÂAccordingly,
> > > the sched_can_stop_tick() false returns true when we are
> > > migrating a
> > > second task onto a core.
> > I don't get it.
> >
> > Looking at the enqueue_task_fair(), I see this:
> >
> > ÂÂÂÂÂÂÂÂÂfor_each_sched_entity(se) {
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcfs_rq = cfs_rq_of(se);
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcfs_rq->h_nr_running++;
> > ...
> > }
> >
> > ÂÂÂÂÂÂÂÂÂif (!se)
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂadd_nr_running(rq, 1);
> >
> > What is the difference between cfs_rq->h_nr_running,
> > and rq->cfs.nr_running?
> >
> > Why do we have two?
> > Are we simply testing against the wrong one in
> > sched_can_stop_tick?
> It seems that using the non-CFS one is what we want.ÂÂI don't know
> whether
> using a different CFS count instead might be more correct.
>
> Since I'm not sure what causes the difference I see between tile
> (correct)
> and arm64 (incorrect) it's hard for me to speculate.
>
> >
> > >
> > > Correct this by using rq->nr_running instead of rq-
> > > >cfs.nr_running.
> > > This should always be more conservative, and reverts the test to
> > > the
> > > form it had before commit 76d92ac305f2 ("sched: Migrate sched to
> > > use
> > > new tick dependency mask model").
> > That would cause us to run the timer tick while running
> > a single SCHED_RR real time task, with a single
> > SCHED_OTHER task sitting in the background (which will
> > not get run until the SCHED_RR task is done).
> No, because in sched_can_stop_tick(), we first handle the special
> cases of RR or FIFO tasks present.ÂÂFor example, RR:
>
> ÂÂÂÂÂÂÂÂÂif (rq->rt.rr_nr_running) {
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (rq->rt.rr_nr_running == 1)
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn true;
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂelse
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn false;
> ÂÂÂÂÂÂÂÂÂ}
>
> Once we see there's any RR tasks running, the return value
> ignores any possible SCHED_OTHER tasks.ÂÂOnly after the code
> concludes there are no RR/FIFO tasks do we even look at
> the over nr_running value.

OK, fair enough. I guess both of the RT cases are
covered already.

Patch gets my:

Acked-by: Rik van Riel <riel@xxxxxxxxxx>

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part