Re: [PATCH] sched: next buddy hint on sleep and preempt path

From: Mike Galbraith
Date: Wed Mar 02 2011 - 02:40:52 EST


On Tue, 2011-03-01 at 23:08 -0800, Paul Turner wrote:
> On Tue, Mar 1, 2011 at 10:47 PM, Mike Galbraith <efault@xxxxxx> wrote:
> > On Tue, 2011-03-01 at 21:43 -0800, Paul Turner wrote:
> >> On Tue, Mar 1, 2011 at 3:33 PM, Venkatesh Pallipadi <venki@xxxxxxxxxx> wrote:
> >
> >> > for_each_sched_entity(se) {
> >> > cfs_rq = cfs_rq_of(se);
> >> > dequeue_entity(cfs_rq, se, flags);
> >> >
> >> > /* Don't dequeue parent if it has other entities besides us */
> >> > - if (cfs_rq->load.weight)
> >> > + if (cfs_rq->load.weight) {
> >> > + /*
> >> > + * Bias pick_next to pick a task from this cfs_rq, as
> >> > + * p is sleeping when it is within its sched_slice.
> >> > + */
> >> > + if (task_flags & DEQUEUE_SLEEP && se->parent)
> >> > + set_next_buddy(se->parent);
> >>
> >> re-using the last_buddy would seem like a more natural fit here; also
> >> doesn't have a clobber race with a wakeup
> >
> > Hm, that would break last_buddy no? A preempted task won't get the CPU
> > back after light preempting thread deactivates. (it's disabled atm
> > unless heavily overloaded anyway, but..)
>
> Ommm yeah.. we're actually a little snookered in this case since the
> pre-empting thread's sleep will be voluntary which will try to return
> time to its hierarchy.
>
> I suppose keeping the last_buddy is preferable to the occasional clobber.

Yeah, I think we don't want to break it. I don't know if pgsql still
uses userland spinlocks, haven't run it in quite a while now, but with
those nasty things, last_buddy was the only thing that kept it from
collapsing into a quivering heap when you try to scale. Preempting a
userland spinlock holder gets ugly in the extreme.

I'm going to test this patch some more, but in light testing, I saw no
interactivity problems with it, and it does _seem_ to be improving
throughput when there are competing grouped loads sharing the box. I
haven't tested that heftily though, that's just watching the numbers and
recalling the relative effect of mixing loads previously.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/