Re: BUG: unable to handle kernel NULL pointer dereference in irq_may_run

From: Thomas Gleixner
Date: Fri Dec 22 2017 - 14:13:53 EST


On Thu, 21 Dec 2017, syzbot wrote:

> Hello,
>
> syzkaller hit the following crash on 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers

Unfortunately I cannot reproduce that issue.

> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: irqd_has_set kernel/irq/internals.h:230 [inline]
> IP: irq_may_run+0x19/0x70 kernel/irq/chip.c:506
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3177 Comm: kworker/u4:2 Not tainted 4.15.0-rc3-next-20171214+ #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:irqd_has_set kernel/irq/internals.h:230 [inline]

So this dereferences

irq_desc->irq_data->common

which is NULL:

2b:* f7 00 00 00 0c 00 testl $0xc0000,(%rax) <-- trapping instruction

> RIP: 0010:irq_may_run+0x19/0x70 kernel/irq/chip.c:506
> RSP: 0018:ffff88021fc03f58 EFLAGS: 00010006
> RAX: 0000000000000000 RBX: ffff8802151fa400 RCX: ffffffff81243385

^^^^^^^^^^^^^^^^

> RDX: 0000000000010000 RSI: 0000000000000000 RDI: ffff8802151fa400
> RBP: ffff88021fc03f68 R08: 0000000000000001 R09: 000000000000000c
> R10: ffff88021fc03ee8 R11: 000000000000000c R12: 0000000000000001
> R13: ffff8802151fa400 R14: 0000000000000027 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000000301e003 CR4: 00000000001606f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> handle_edge_irq+0x33/0x220 kernel/irq/chip.c:755
> generic_handle_irq_desc include/linux/irqdesc.h:159 [inline]
> handle_irq+0x15/0x20 arch/x86/kernel/irq_64.c:77
> do_IRQ+0x53/0x100 arch/x86/kernel/irq.c:229
> common_interrupt+0xa9/0xa9 arch/x86/entry/entry_64.S:695

Now what confuses me is the fact that

irq_desc->irq_data->common

is initialized in desc_set_defaults() when the irq descriptor is
allocated. It's not written to after that. Plus it got dereferenced before.
So this looks like a stray pointer.

I have no clue how that could be related to the reproducer. Is this
reproducing 100% on your end? If yes I surely can try to add some debug
which might help to catch this.

Thanks,

tglx