Re: [PATCH v4 0/3] Compile-time stack frame pointer validation

From: Ingo Molnar
Date: Thu May 21 2015 - 03:52:40 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> >>
> >> I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the
> > stack, which may or may not have been called." It's needed
> > because stack walking isn't currently 100% reliable.
>
> It is often quite interesting and helpful, because it shows stale
> data on the stack, giving clues about what happened just before.

Yes, it's basically a zero-cost tracer: often showing a partial trace
of events that happened before.

> Now, I'd like gcc to generally be better about not wasting so much
> stack frame, so in that sense I'd like to see fewer '?" entries just
> from a code quality standpoint, but when debugging those things, the
> downside of "noise" is often cancelled by the upside of "ahh, it
> happens after calling X".
>
> So the "perfect stack frames" is actually not as great a thing as
> some people want to make it seem.

We should definitely also print out the '?' entries, they are very
useful especially when analyzing rare, difficult to reproduce,
sporadic bugs - which are usually the hardest to fix bugs.

The biggest long term plus of 'perfect stack frames' would not be to
skip the '?' entries (we don't want to skip them!), but to be able to
eventually build the kernel without frame pointers.

Especially on modern x86 CPUs with stack engines (latest Intel and AMD
CPUs) that keeps ESP updates out of the later stages of execution
pipelines, going from RBP framepointers to direct ESP use is
beneficial to performance and compresses I$ footprint as well:

text data bss dec hex filename
12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux

Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used
in the -falign-functions measuremenst gives this for
CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):

#
# CONFIG_FRAMEPOINTERS=y
#
Performance counter stats for 'system wide' (10 runs):

728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%)
11,891,931,664 instructions ( +- 0.00% )
300,023 context-switches ( +- 0.00% )

7.324048170 seconds time elapsed ( +- 0.09% )

... and these are the I$ miss perf stats from running the same
workload on a CONFIG_FRAMEPOINTERS=n kernel:

#
# CONFIG_FRAMEPOINTERS are not set
#
Performance counter stats for 'system wide' (10 runs):

687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%)
10,984,908,013 instructions ( +- 0.01% )
300,021 context-switches ( +- 0.00% )

7.120867260 seconds time elapsed ( +- 0.29% )

So if we disable frame pointers, then on this workload:

- the kernel text size is 9.3% smaller
- the number of instructions executed went down by about 8.2%
- the cachemiss rate went down by about 5.9%
- performance went up by about 2.8%.

The speedup is actually even better than 2.8%, if you look at average
execution time:

linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% )

avg = 7.451609

linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% )

avg = 7.164260

Then with framepointers disabled this workload gets faster by 4.0% on
average.

The average result is also pretty stable in the no-framepointers case,
while it fluctuates more in the framepointers case. (and this is why
the 'best runtime' favors the framepointers case - the average is
closer to reality.)

So the performance advantages of not doing framepointers is not
something we can ignore IMHO: but obviously performance isn't
everything - so if stack unwinding is unrobust, then we need and
want frame pointers.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/