Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc

From: Peter Zijlstra
Date: Thu Jan 21 2016 - 05:20:25 EST


On Thu, Jan 21, 2016 at 11:06:45AM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> > On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
> >> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
> >>> -> #3 (&buf->lock){+.+...}:
> >>> [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
> >>> [< inline >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
> >>> [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
> >>> [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
> >>
> >> So in any recent code that I look at this function tries to acquire
> >> tty->ctrl_lock, not buf->lock. Am I missing something ?!
> >
> > Yes.
> >
> > The tty locks were annotated with __lockfunc so were being elided from lockdep
> > stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
> > annotation ("tty: Remove __lockfunc annotation from tty lock functions").
> >
> > Unfortunately, I think syzkaller's post-processing stack trace isn't helping
> > either, giving the impression that the stack is still inside tty_get_pgrp().
> >
> > It's not.
>
> I've got a new report on commit
> a200dcb34693084e56496960d855afdeaaf9578f (Jan 18).
> Here is unprocessed version:
> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
> and here is processed one:
> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>
> Peter, what exactly is wrong with the post-processed version? I would
> be interested in fixing the processing script.
>
> As far as I see it contains the same stacks just with line numbers and
> inlined frames. I am using a significantly different compilation mode
> (kasan + kcov + very recent gcc), so nobody except me won't be able to
> figure out line numbers based on offsets.

I'm not sure anything is wrong with the post-processing. My confusion
(and Jiri) was that the stack trace lists
tty_get_pgrp()->_raw_spin_lock_irqsave() but that function acquires
tty->ctrl_lock, not as the report claims buf->lock.

PeterH says that lockdep omits functions tagged with __lockfunc, but my
reading disagrees with that.

kernel/locking/lockdep.c:save_trace()
save_stack_trace()
dump_trace(.ops = &save_stack_ops)

which has: .address = save_stack_address
__save_stack_address(.nosched = false)

So lockdep should very much include lock functions.


But this confusion is part of the original report, not because of the
post-processing.