Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT

From: Nick Piggin
Date: Tue Jan 20 2009 - 07:34:31 EST


On Tue, Jan 20, 2009 at 12:26:34PM +0100, Ingo Molnar wrote:
>
> * Nick Piggin <npiggin@xxxxxxx> wrote:
>
> > Hi,
> >
> > I'm looking at regressions since 2.6.16, and one is lat_mmap has slowed
> > down. On further investigation, a large part of this is not due to a
> > _regression_ as such, but the introduction of CONFIG_PARAVIRT=y.
> >
> > Now, it is true that lat_mmap is basically a microbenchmark, however it
> > is exercising the memory mapping and page fault handler paths, so we're
> > talking about pretty important paths here. So I think it should be of
> > interest.
> >
> > I've run the tests on a 2s8c AMD Barcelona system, binding the test to
> > CPU0, and running 100 times (stddev is a bit hard to bring down, and my
> > scripts needed 100 runs in order to pick up much smaller changes in the
> > results -- for CONFIG_PARAVIRT, just a couple of runs should show up the
> > problem).
> >
> > Times I believe are in nanoseconds for lmbench, anyway lower is better.
> >
> > non pv AVG=464.22 STD=5.56
> > paravirt AVG=502.87 STD=7.36
> >
> > Nearly 10% performance drop here, which is quite a bit... hopefully
> > people are testing the speed of their PV implementations against non-PV
> > bare metal :)
>
> Ouch, that looks unacceptably expensive. All the major distros turn
> CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express
> promise to have no measurable runtime overhead.
>
> ( And i suspect the real life mmap cost is probably even more expensive,
> as on a Barcelona all of lmbench fits into the cache hence we dont see
> any real $cache overhead. )

The PV kernel has over 100K larger text size, nearly 40K alone in mm/ and
kernel/. Definitely we don't see the worst of the icache or branch buffer
overhead on this microbenchmark. (wow, that's a nasty amount of bloat :( )


> Jeremy, any ideas where this slowdown comes from and how it could be
> fixed?

I had a bit of a poke around the profiles, but nothing stood out. However
oprofile counted 50% more cycles in the kernel with PV than with non-PV.
I'll have to take a look at the user/system times, because 50% seems
ludicrous.... hopefully it's just oprofile noise.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/