Re: [syzbot] [kernel?] possible deadlock in alarm_handle_timer

From: Thomas Gleixner
Date: Tue Dec 12 2023 - 14:19:56 EST


On Sat, Dec 09 2023 at 11:08, xingwei lee wrote:
> Hello, tglx.

Please do not top-post: https://people.kernel.org/tglx/notes-about-netiquette

> I reprduce this bug with
> linux-next commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?id=eff99d8edbed7918317331ebd1e365d8e955d65e
> kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=61991b2630c19677
> the same configuration as the syzbot dashboard:
> https://syzkaller.appspot.com/bug?extid=f2c4e7bfcca6c6d6324c.

If you want that people look after your report, then the report needs to
carry the information right away.

> However, I do not entangled the information and just try to generate
> repro.c with the configuration provided by syzbot dashboard.

The reproducer file alone is useless without the rest of the information.

> When I try the repro.c in the lasted upstream commit:
> f2e8a57ee9036c7d5443382b6c3c09b51a92ec7e, it can't crash the kernel at
> all. Should I assume this bug was fixed by the mainline?

Assumptions are not helpful. And if you look at the syzkaller report
then it's entirely clear that the problem is _NOT_ in
alarm_handle_timer().

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sighand->siglock);
local_irq_disable();
lock(&new_timer->it_lock);
lock(&sighand->siglock);
<Interrupt>
lock(&new_timer->it_lock);

*** DEADLOCK ***

So it's clear that sighand->siglock is taken without interrupts disabled
somewhere and the report tells you exactly where:

-> (&sighand->siglock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
<SNIP/>
ptrace_set_stopped kernel/ptrace.c:391 [inline]
ptrace_attach+0x401/0x650 kernel/ptrace.c:478
<SNIP/>

Now if you'd dig into the git history of linux-next then you'd figure
out that the commit which introduced the problem:

5431fdd2c181 ("ptrace: Convert ptrace_attach() to use lock guards")

was removed from linux-next on 2023-11-20, which is 3 days later than
the commit tag you linked to and never came back and therefore is not in
mainline.

It's all fine if you try to get a reproducer for something, but you
could have spared all of us a lot of time if you validated that the
problem still persists in linux-next and upstream.

Thanks,

tglx