Re: WARNING in __mutex_unlock_slowpath

From: Roman Kagan
Date: Tue May 08 2018 - 12:42:20 EST


On Mon, May 07, 2018 at 07:19:04PM +0200, Paolo Bonzini wrote:
> On 29/04/2018 19:00, syzbot wrote:
> > syzbot hit the following crash on upstream commit
> > bf8f5de17442bba5f811e7e724980730e079ee11 (Sat Apr 28 17:05:04 2018 +0000)
> > MAINTAINERS: add myself as maintainer of AFFS
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=35666cba7f0a337e2e79
> >
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5686569910403072
> > syzkaller reproducer:
> > https://syzkaller.appspot.com/x/repro.syz?id=5767017265102848
> > Raw console output:
> > https://syzkaller.appspot.com/x/log.txt?id=6346308495343616
> > Kernel config:
> > https://syzkaller.appspot.com/x/.config?id=7043958930931867332
> > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+35666cba7f0a337e2e79@xxxxxxxxxxxxxxxxxxxxxxxxx
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> >
> > ------------[ cut here ]------------
> > DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> > WARNING: CPU: 0 PID: 4525 at kernel/locking/mutex.c:1032
> > __mutex_unlock_slowpath+0x62e/0x8a0 kernel/locking/mutex.c:1032
> > Kernel panic - not syncing: panic_on_warn set ...
>
> This doesn't make much sense, unless it's a "generic" memory corruption,
> but at least the reproducer seems to be simple, just (in pseudocode)
>
> ioctl(kvm_vm_fd, KVM_HYPERV_EVENTFD,
> { fd = some_eventfd, conn_id = 0, flags = 0 })
> ioctl(kvm_vm_fd, KVM_HYPERV_EVENTFD,
> { fd = -1, conn_id = 5, flags = KVM_HYPERV_EVENTFD_DEASSIGN })
>
> Roman, Cathy, can you give it a quick look? (Reproducing the reproducer
> link: https://syzkaller.appspot.com/x/repro.c?id=5686569910403072).

Something seems broken in the IDR machinery: IDR with a single id==0
entry reliably crashes when attempting to idr_remove a non-zero id.
Other combinations look fine: removing the existing id==0 entry;
removing a non-existing entry from an IDR with at least one id!=0 entry.

I still haven't pinpointed the root cause.
Cc-ing Matthew.

Roman.