Re: [PATCH v2] scripts: add script for translating stack dump function offsets

From: Linus Torvalds
Date: Fri Sep 16 2016 - 21:59:41 EST


On Fri, Sep 16, 2016 at 5:42 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Fri, Sep 16, 2016 at 05:09:15PM -0700, Linus Torvalds wrote:
>> On Fri, Sep 16, 2016 at 2:26 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> >
>> > Ok, how about this. If this looks ok, would you be willing to apply it?
>>
>> Looks good to me. Did you test the size verification with some made-up cases?
>
> Yep. And I tested all the other edge cases that occurred to me.

Hmm. So I tested it a bit, and I found a few issues..

(1) actual bug: you really need the "-W" flag to 'readelf'. Otherwise
it will truncate the lines to fit in 80 columns, which in turn limits
symbol names to 25 characters or something like that.

(2) usability: I have been known to want to look up multiple symbols.
So could we support a syntax like

/scripts/faddr2line vmlinux function1+15/226 other_fn_name+32/128

or something like that?

(3) noise: I have to say that it seems to work really well, but the
"skipping" messages are a bit verbose.

I guess they practically never actually *trigger*, but

[torvalds@i7 linux]$ ./scripts/faddr2line vmlinux type_show+0x10/45
skipping type_show address at 0xffffffff81023690 due to size
mismatch (45 != 166)
skipping type_show address at 0xffffffff811894f0 due to size
mismatch (45 != 41)
/home/torvalds/v2.6/linux/drivers/video/backlight/backlight.c:213
skipping type_show address at 0xffffffff814e9340 due to size
mismatch (45 != 119)
skipping type_show address at 0xffffffff8157a080 due to size
mismatch (45 != 50)
skipping type_show address at 0xffffffff815bbeb0 due to size
mismatch (45 != 38)
skipping type_show address at 0xffffffff815ea8c0 due to size
mismatch (45 != 35)
skipping type_show address at 0xffffffff8162c650 due to size
mismatch (45 != 40)
skipping type_show address at 0xffffffff8162f910 due to size
mismatch (45 != 38)
skipping type_show address at 0xffffffff81694ec0 due to size
mismatch (45 != 26)

it's almost hard to pick out the case that succeeded from all the
noise from the ones that didn't.

(4) ambiguous "inlining" vs "multiple possible cases":

[torvalds@i7 linux]$ ./scripts/faddr2line vmlinux free+15/36
/home/torvalds/v2.6/linux/./include/crypto/algapi.h:302
/home/torvalds/v2.6/linux/crypto/lrw.c:377
/home/torvalds/v2.6/linux/./include/crypto/algapi.h:302
/home/torvalds/v2.6/linux/crypto/xts.c:334

That's actually two different cases, both of which inline
crypto_instance_ctx(), and both of which are really the exact same
code (just lrw vs xts), so they have the same name and size.

(5) I'd love for the pathnames to be shown relative to the root of the project

And (1) is trivial to fix (use "-Ws" instead of "-s" to readelf).

I don't think (2) is a huge deal, but I suspect it wouldn't be nasty
to do by just using a shell function and iterating over the
arguments..

But (3) might be a "don't care, it's so unusual as to not be an
issue", although it might also be a case of "maybe we should only show
the mismatches if there are *no* matches, or if teh user specified
verbose output with '-v' or something"

I think (4) is worth fixing. Maybe by simply outputting the function
name/offset/size again for each hit, which is something you'd need to
do for (2) anyway, so that the above case would become

[torvalds@i7 linux]$ ./scripts/faddr2line vmlinux free+15
vmlinux free+15/36:
/home/torvalds/v2.6/linux/./include/crypto/algapi.h:302
/home/torvalds/v2.6/linux/crypto/lrw.c:377
vmlinux free+15/36:
/home/torvalds/v2.6/linux/./include/crypto/algapi.h:302
/home/torvalds/v2.6/linux/crypto/xts.c:334

(Note how I didn't give the size on the command line, but the printout
showed it anyway).

And finally, I suspect (5) is not reasonably fixable. Oh well. It
would require some kind of "figure out the largest common prefix of
all the filenames in the whole object file". So I'm *not* talking
about just passing "--basenames" to addr2line.

Hmm. Maybe looking up the "DW_AT_comp_dir" tag in the debug info would
work? And just stripping that off?

But on the whole, this is really nice. I always disliked the stupid
addr2line crap. This just looks like we can get *much* better output
by tweaking things to what the kernel uses/needs..

Would you mind looking at those things? Just (1) I can do myself, but
I'm hoping you'd be willing to maybe do a bit more surgery on your own
script..

Linus