Re: [syzbot] [kernel?] possible deadlock in alarm_handle_timer
From: Thomas Gleixner
Date: Tue Dec 12 2023 - 14:19:56 EST
On Sat, Dec 09 2023 at 11:08, xingwei lee wrote:
> Hello, tglx.
Please do not top-post: https://people.kernel.org/tglx/notes-about-netiquette
> I reprduce this bug with
> linux-next commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?id=eff99d8edbed7918317331ebd1e365d8e955d65e
> kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=61991b2630c19677
> the same configuration as the syzbot dashboard:
> https://syzkaller.appspot.com/bug?extid=f2c4e7bfcca6c6d6324c.
If you want that people look after your report, then the report needs to
carry the information right away.
> However, I do not entangled the information and just try to generate
> repro.c with the configuration provided by syzbot dashboard.
The reproducer file alone is useless without the rest of the information.
> When I try the repro.c in the lasted upstream commit:
> f2e8a57ee9036c7d5443382b6c3c09b51a92ec7e, it can't crash the kernel at
> all. Should I assume this bug was fixed by the mainline?
Assumptions are not helpful. And if you look at the syzkaller report
then it's entirely clear that the problem is _NOT_ in
alarm_handle_timer().
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
local_irq_disable();
lock(&new_timer->it_lock);
lock(&sighand->siglock);
<Interrupt>
lock(&new_timer->it_lock);
*** DEADLOCK ***
So it's clear that sighand->siglock is taken without interrupts disabled
somewhere and the report tells you exactly where:
-> (&sighand->siglock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
<SNIP/>
ptrace_set_stopped kernel/ptrace.c:391 [inline]
ptrace_attach+0x401/0x650 kernel/ptrace.c:478
<SNIP/>
Now if you'd dig into the git history of linux-next then you'd figure
out that the commit which introduced the problem:
5431fdd2c181 ("ptrace: Convert ptrace_attach() to use lock guards")
was removed from linux-next on 2023-11-20, which is 3 days later than
the commit tag you linked to and never came back and therefore is not in
mainline.
It's all fine if you try to get a reproducer for something, but you
could have spared all of us a lot of time if you validated that the
problem still persists in linux-next and upstream.
Thanks,
tglx