Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace

From: Frederic Weisbecker
Date: Fri Jul 08 2016 - 11:17:47 EST


On Fri, Jul 08, 2016 at 09:29:29AM -0500, Josh Poimboeuf wrote:
> On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> >
> > * Byungchul Park <byungchul.park@xxxxxxx> wrote:
> >
> > > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > > I want to proceed saperately since it's somewhat independent from each
> > > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > > which makes crossrelease work smoothly.
> > >
> > > What do you think about this way to improve it?
> >
> > I like both of your improvements, the speed up is impressive:
> >
> > [ 2.327597] save_stack_trace() takes 87114 ns
> > ...
> > [ 2.781694] save_stack_trace() takes 20044 ns
> > ...
> > [ 3.103264] save_stack_trace takes 3821 (sched_lock)
> >
> > Could you please also measure call graph recording (perf record -g), how much
> > faster does it get with your patches and what are our remaining performance hot
> > spots?
> >
> > Could you please merge your patches to the latest -tip tree, because this commit I
> > merged earlier today:
> >
> > 81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> >
> > conflicts with your patches. (I'll push this commit out later today.)
> >
> > Also, could you please rename the _norm names to _fast or so, to signal that this
> > is a faster but less reliable method to get a stack dump? Nobody knows what
> > '_norm' means, but '_fast' is pretty self-explanatory.
>
> Hm, but is print_context_stack_bp() variant really less reliable? From
> what I can tell, its only differences vs print_context_stack() are:
>
> - It doesn't scan the stack for "guesses" (which are 'unreliable' and
> are ignored by the ops->address() callback anyway).
>
> - It stops if ops->address() returns an error (which in this case means
> the array is full anyway).
>
> - It stops if the address isn't a kernel text address. I think this
> shouldn't normally be possible unless there's some generated code like
> bpf on the stack. Maybe it could be slightly improved for this case.
>
> So instead of adding a new save_stack_trace_fast() variant, why don't we
> just modify the existing save_stack_trace() to use
> print_context_stack_bp()?

I'm not sure this is a good idea. First of all if the kernel isn't built with
frame pointers, all you have is wild walk guesses. Also even if frame pointers
is built, the bp-non-validated "guesses" are important clues for debugging because
they tell about previous calls that happened, or callbacks that were reffered to by
the stack.

There are several different users of save_stack_trace() in the kernel, we can't
be sure that all of them are interested in dropping those guesses.

So I'd rather advocate in favour of a new seperate helper.

Thanks.