Re: RFC [patch] sched: strengthen LAST_BUDDY and minimize buddyinduced latencies V3

From: Mike Galbraith
Date: Tue Oct 20 2009 - 01:03:46 EST


On Tue, 2009-10-20 at 06:24 +0200, Peter Zijlstra wrote:
> On Sat, 2009-10-17 at 12:24 +0200, Mike Galbraith wrote:
> > sched: strengthen LAST_BUDDY and minimize buddy induced latencies.
> >
> > This patch restores the effectiveness of LAST_BUDDY in preventing pgsql+oltp
> > from collapsing due to wakeup preemption. It also minimizes buddy induced
> > latencies. x264 testcase spawns new worker threads at a high rate, and was
> > being affected badly by NEXT_BUDDY. It turned out that CACHE_HOT_BUDDY was
> > thwarting idle balancing. This patch ensures that the load can disperse,
> > and that buddies can't make any task excessively late.
>
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -2007,8 +2007,12 @@ task_hot(struct task_struct *p, u64 now,
> >
> > /*
> > * Buddy candidates are cache hot:
> > + *
> > + * Do not honor buddies if there may be nothing else to
> > + * prevent us from becoming idle.
> > */
> > if (sched_feat(CACHE_HOT_BUDDY) &&
> > + task_rq(p)->nr_running >= sched_nr_latency &&
> > (&p->se == cfs_rq_of(&p->se)->next ||
> > &p->se == cfs_rq_of(&p->se)->last))
> > return 1;
>
> I'm not sure about this. The sched_nr_latency seems arbitrary, 1 seems
> like a more natural boundary.

That's what I did first, which of course worked fine.

What I'm thinking of doing instead though is to specifically target the
only time I see the problem, ie fork/exec load wanting to disperse. I
don't really want to see buddies being ripped away from their cache.
But as you note below, that can be a good thing iff it lands on a shared
cache. In my case, there's a 1 in 3 chance of safe landing.

> Also, one thing that arjan found was that we don't need to consider
> buddies cache hot if we're migrating them within a cache domain. So we
> need to add a SD_flag and sched_domain to properly represent the cache
> hierarchy.

Yeah, I thought about this too. If there's any overlap time, waking CPU
affine is a loser if there's an idle shared cache next door.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/