Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc

From: Peter Hurley
Date: Thu Jan 21 2016 - 12:51:22 EST


On 01/21/2016 02:20 AM, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 11:06:45AM +0100, Dmitry Vyukov wrote:
>> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>>> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>>>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>>>> -> #3 (&buf->lock){+.+...}:
>>>>> [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
>>>>> [< inline >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
>>>>> [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
>>>>> [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
>>>>
>>>> So in any recent code that I look at this function tries to acquire
>>>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>>>
>>> Yes.
>>>
>>> The tty locks were annotated with __lockfunc so were being elided from lockdep
>>> stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
>>> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>>>
>>> Unfortunately, I think syzkaller's post-processing stack trace isn't helping
>>> either, giving the impression that the stack is still inside tty_get_pgrp().
>>>
>>> It's not.
>>
>> I've got a new report on commit
>> a200dcb34693084e56496960d855afdeaaf9578f (Jan 18).
>> Here is unprocessed version:
>> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
>> and here is processed one:
>> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>>
>> Peter, what exactly is wrong with the post-processed version? I would
>> be interested in fixing the processing script.
>>
>> As far as I see it contains the same stacks just with line numbers and
>> inlined frames. I am using a significantly different compilation mode
>> (kasan + kcov + very recent gcc), so nobody except me won't be able to
>> figure out line numbers based on offsets.
>
> I'm not sure anything is wrong with the post-processing. My confusion
> (and Jiri) was that the stack trace lists
> tty_get_pgrp()->_raw_spin_lock_irqsave() but that function acquires
> tty->ctrl_lock, not as the report claims buf->lock.

I think this is the other way around; that lockdep has graphed a circular
dependency chain, but that something is wrong with the stack traces.

> PeterH says that lockdep omits functions tagged with __lockfunc, but my
> reading disagrees with that.
>
> kernel/locking/lockdep.c:save_trace()
> save_stack_trace()
> dump_trace(.ops = &save_stack_ops)
>
> which has: .address = save_stack_address
> __save_stack_address(.nosched = false)
>
> So lockdep should very much include lock functions.

Wild hypothesis on my part.

> But this confusion is part of the original report, not because of the
> post-processing.

Yeah, agreed. The original report is very odd.