Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc

From: Peter Hurley
Date: Thu Jan 21 2016 - 12:43:40 EST


On 01/21/2016 02:06 AM, Dmitry Vyukov wrote:
> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>>> -> #3 (&buf->lock){+.+...}:
>>>> [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
>>>> [< inline >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
>>>> [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
>>>> [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
>>>
>>> So in any recent code that I look at this function tries to acquire
>>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>>
>> Yes.
>>
>> The tty locks were annotated with __lockfunc so were being elided from lockdep
>> stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
>> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>>
>> Unfortunately, I think syzkaller's post-processing stack trace isn't helping
>> either, giving the impression that the stack is still inside tty_get_pgrp().
>>
>> It's not.
>
> I've got a new report on commit
> a200dcb34693084e56496960d855afdeaaf9578f (Jan 18).
> Here is unprocessed version:
> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
> and here is processed one:
> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>
> Peter, what exactly is wrong with the post-processed version?

Yeah, ok, I assumed the problem with this report was post-processing
because of the other report that had mixed-up info.

However, the #3 stacktrace is obviously wrong, as others have already noted.
Plus, the #1 stacktrace is wrong as well.

> I would be interested in fixing the processing script.

Not that it's related (since the original, not-edited report has bogus
stacktraces), but how are you doing debug symbol lookup?

Because below is not correct. Should be kernel/kthread.c:177 (or thereabouts)

[<ffffffff813b423f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303


> As far as I see it contains the same stacks just with line numbers and
> inlined frames.

Agree, now that I see the original report.

> I am using a significantly different compilation mode
> (kasan + kcov + very recent gcc), so nobody except me won't be able to
> figure out line numbers based on offsets.

Weird. Maybe something to do with the compiler.

Can you get me the dmesg output running the patch below?

Please first just get me the output shortly after boot (please do a
ctrl+C in a terminal window to trigger the expected stacktrace
corresponding to #3 stacktrace from your report).

Then please run your test that generates the circular lockdep report.

--- >% ---
Subject: [PATCH] debug: dump stacktraces for tty lock dependencies

---
kernel/locking/lockdep.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 60ace56..b67cafb 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -332,7 +332,7 @@ EXPORT_SYMBOL(lockdep_on);
* Debugging switches:
*/

-#define VERBOSE 0
+#define VERBOSE 1
#define VERY_VERBOSE 0

#if VERBOSE
@@ -351,13 +351,15 @@ EXPORT_SYMBOL(lockdep_on);
*/
static int class_filter(struct lock_class *class)
{
-#if 0
- /* Example */
+#if 1
if (class->name_version == 1 &&
- !strcmp(class->name, "lockname"))
+ !strcmp(class->name, "&buf->lock"))
return 1;
if (class->name_version == 1 &&
- !strcmp(class->name, "&struct->lockfield"))
+ !strcmp(class->name, "&port->buf.lock/1"))
+ return 1;
+ if (class->name_version == 1 &&
+ !strcmp(class->name, "routelock"))
return 1;
#endif
/* Filter everything else. 1 would be to allow everything else */
--
2.7.0