Re: missing stack trace entry on NULL pointer call [was: Re: BUG: unable to handle kernel NULL pointer dereference in __generic_file_write_iter]

From: Jann Horn
Date: Thu Feb 28 2019 - 19:54:36 EST


On Thu, Feb 28, 2019 at 5:34 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Thu, Feb 28, 2019 at 1:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > On Thu, 28 Feb 2019, Jann Horn wrote:
> > > +Josh for unwinding, +x86 folks
> > > On Wed, Feb 27, 2019 at 11:43 PM Andrew Morton
> > > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > On Thu, 21 Feb 2019 06:52:04 -0800 syzbot <syzbot+ca95b2b7aef9e7cbd6ab@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit: 4aa9fc2a435a Revert "mm, memory_hotplug: initialize struct..
> > > > > git tree: upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1101382f400000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ca95b2b7aef9e7cbd6ab
> > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > >
> > > > Not understanding. That seems to be saying that we got a NULL pointer
> > > > deref in __generic_file_write_iter() at
> > > >
> > > > written = generic_perform_write(file, from, iocb->ki_pos);
> > > >
> > > > which isn't possible.
> > > >
> > > > I'm not seeing recent changes in there which could have caused this. Help.
> > >
> > > +
> > >
> > > Maybe the problem is that the frame pointer unwinder isn't designed to
> > > cope with NULL function pointers - or more generally, with an
> > > unwinding operation that starts before the function's frame pointer
> > > has been set up?
> > >
> > > Unwinding starts at show_trace_log_lvl(). That begins with
> > > unwind_start(), which calls __unwind_start(), which uses
> > > get_frame_pointer(), which just returns regs->bp. But that frame
> > > pointer points to the part of the stack that's storing the address of
> > > the caller of the function that called NULL; the caller of NULL is
> > > skipped, as far as I can tell.
> > >
> > > What's kind of annoying here is that we don't have a proper frame set
> > > up yet, we only have half a stack frame (saved RIP but no saved RBP).
> >
> > That wreckage is related to the fact that the indirect calls are going
> > through __x86_indirect_thunk_$REG. I just verified on a VM with some other
> > callback NULL'ed that the resulting backtrace is not really helpful.
> >
> > So in that case generic_perform_write() has two indirect calls:
> >
> > mapping->a_ops->write_begin() and ->write_end()
>
> Does the indirect thunk thing really make any difference? When you
> arrive at RIP=NULL, RSP points to a saved instruction pointer, just
> like when indirect calls are compiled normally.
>
> I just compiled kernels with artificial calls to a NULL function
> pointer (in prctl_set_seccomp()), with retpoline disabled, with both
> unwinders. The ORC unwinder shows a call trace with "?" everywhere
> that doesn't show the caller:
[...]
> So I think this doesn't really have anything to do with
> __x86_indirect_thunk_$REG, and the best possible fix might be to teach
> the unwinders that RIP==NULL means "pretend that RIP is *real_RSP and
> that RSP is real_RSP+8, and report *real_RSP as the first element of
> the backtrace".

Cooking up some patches now...