Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP

From: Ingo Molnar
Date: Wed Nov 20 2019 - 06:40:39 EST



* Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Mon, Nov 18, 2019 at 05:29:42PM +0100, Dmitry Vyukov wrote:
> > > And of not having a standard way to signal "this line starts something
> > > that should be reported as a bug"? Maybe as a longer-term idea, it'd
> > > help to have some sort of extra prefix byte that the kernel can print
> > > to say "here comes a bug report, first line should be the subject", or
> > > something like that, similar to how we have loglevels...
> >
> > This would be great.
> > Also a way to denote crash end.
> > However we have lots of special logic for subjects, not sure if kernel
> > could provide good subject:
> > https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
> > Probably it could, but it won't be completely trivial. E.g. if there
> > is a stall inside of a timer function, it should give the name of the
> > actual timer callback as identity ("stall in timer_subsystem_foo"). Or
> > for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
> > different than saying "there is a bug in kernel" :)
>
> While external tools are fine and cool, they can't really block kernel
> development and printk strings format is not an ABI. And yeah, we have
> this discussion each time someone proposes changes to those "magic"
> strings but I guess it is about time to fix this in a way that any
> future changes don't break tools.
>
> And so I like the idea of marking *only* the first splat with some small
> prefix char as that first splat is the special and very important one.
> I.e., the one where die_counter is 0.
>
> So I could very well imagine something like:
>
> ...
> [ 2.523708] Write protecting the kernel read-only data: 16384k
> [ 2.524729] Freeing unused kernel image (text/rodata gap) memory: 2040K
> [ 2.525594] Freeing unused kernel image (rodata/data gap) memory: 368K
> [ 2.541414] x86/mm: Checked W+X mappings: passed, no W+X pages found.
>
> <--- important first splat starts here:
>
> [ 2.542218] [*] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> [ 2.543343] [*] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [ 2.544138] [*] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> [ 2.545120] [*] RIP: 0010:kernel_init+0x58/0x107
> [ 2.546055] [*] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89

> <--- and ends here.
>
> to denote that first splat. And the '*' thing is just an example - it
> can be any char - whatever's easier to grep for.

Well, this would break various pieces of tooling I'm sure.

Maybe it would be nicer to tooling to embedd the splat-counter in the
timestamp in a way:

> [ 2.542218-#1] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> [ 2.543343-#1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [ 2.544138-#1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> [ 2.545120-#1] RIP: 0010:kernel_init+0x58/0x107
> [ 2.546055-#1] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89

That way we'd not only know that it's the first splat, but we'd know it
from all the *other* splats as well where they are in the splat-rank ;-)

(Also Cc:-ed Linus.)

Thanks,

Ingo