Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to3.6-rc5 on AMD chipsets - bisected

From: Mike Galbraith
Date: Thu Sep 27 2012 - 02:34:24 EST


On Thu, 2012-09-27 at 07:47 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@xxxxxx> wrote:
>
> > I think the pgbench problem is more about latency for the 1 in
> > 1:N than spinlocks.
>
> So my understanding of the psql workload is that basically we've
> got a central psql proxy process that is distributing work to
> worker psql processes. If a freshly woken worker process ever
> preempts the central proxy process then it is preventing a lot
> of new work from getting distributed.
>
> Correct?

Yeah, that's my understanding of the thing, and I played with it quite a
bit in the past (only refreshed memories briefly in present).

> So the central proxy psql process is 'much more important' to
> run than any of the worker processes - an importance that is not
> (currently) visible from the behavioral statistics the scheduler
> keeps on tasks.

Yeah. We had the adaptive waker thing, but it stopped being a winner at
the one load it originally did help quite a lot, and it didn't help
pgbench all that much in it's then form anyway iirc.

> So the scheduler has the following problem here: a new wakee
> might be starved enough and the proxy might have run long enough
> to really justify the preemption here and now. The buddy
> statistics help avoid some of these cases - but not all and the
> difference is measurable.
>
> Yet the 'best' way for psql to run is for this proxy process to
> never be preempted. Your SCHED_BATCH experiments confirmed that.

Yes.

> The way remote CPU selection affects it is that if we ever get
> more aggressive in selecting a remote CPU then we, as a side
> effect, also reduce the chance of harmful preemption of the
> central proxy psql process.

Right.

> So in that sense sibling selection is somewhat of an indirect
> red herring: it really only helps psql indirectly by preventing
> the harmful preemption. It also, somewhat paradoxially argues
> for suboptimal code: for example tearing apart buddies is
> beneficial in the psql workload, because it also allows the more
> important part of the buddy to run more (the proxy).

Yes, I believe preemption dominates, but it's not alone, you can see
that in the numbers.

> In that sense the *real* problem isnt even parallelism (although
> we obviously should improve the decisions there - and the logic
> has suffered in the past from the psql dilemma outlined above),
> but whether the scheduler can (and should) identify the central
> proxy and keep it running as much as possible, deprioritizing
> fairness, wakeup buddies, runtime overlap and cache affinity
> considerations.
>
> There's two broad solutions that I can see:
>
> - Add a kernel solution to somehow identify 'central' processes
> and bias them. Xorg is a similar kind of process, so it would
> help other workloads as well. That way lie dragons, but might
> be worth an attempt or two. We already try to do a couple of
> robust metrics, like overlap statistics to identify buddies.

What we do now works well for X and friends I think, because there
aren't so many buddies It might work better though, and for the same
reasons. I've in fact [re]invented a SCHED_SERVER class a few times,
but never one that survived my own scrutiny for long.

Arrr, here there be dragons is true ;-)

> - Let user-space occasionally identify its important (and less
> important) tasks - say psql could mark it worker processes as
> SCHED_BATCH and keep its central process(es) higher prio. A
> single line of obvious code in 100 KLOCs of user-space code.
>
> Just to confirm, if you turn off all preemption via a hack
> (basically if you turn SCHED_OTHER into SCHED_BATCH), does psql
> perform and scale much better, with the quality of sibling
> selection and spreading of processes only being a secondary
> effect?

That has always been the case here. Preemption dominates. Others
should play with it too, and let their boxen speak.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/