Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc

From: Peter Hurley
Date: Wed Jan 20 2016 - 15:47:23 EST


On 01/20/2016 08:08 AM, Peter Hurley wrote:
> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>> -> #3 (&buf->lock){+.+...}:
>>> [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
>>> [< inline >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
>>> [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
>>> [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
>>
>> So in any recent code that I look at this function tries to acquire
>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>
> Yes.
>
> The tty locks were annotated with __lockfunc so were being elided from lockdep
> stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>
> Unfortunately, I think syzkaller's post-processing stack trace isn't helping
> either, giving the impression that the stack is still inside tty_get_pgrp().
>
> It's not.
>
> It's in pty_flush_buffer(), which is taking the 'other tty' buf->lock.
>
> Looks to me like the lock inversion is caused by the tty_driver_flush_buffer()
> in n_tracerouter_open()/_close(), but I need to look at this mess a little
> closer.

Unfortunately, there's not enough information in the lockdep report to
conclusively determine if there really is a potential for deadlock
(for one, the reports don't show what locks are held by each trace which
is a problem if one of the traces terminates at a global mutex while holding
subclassed locks).

Nevertheless, the patch below should "fix" the problem.

Regards,
Peter Hurley

--- >% ---
Subject: [PATCH] tty: Fix lock inversion in N_TRACEROUTER

n_tracerouter_open()/_close() may cause a lock inversion when
N_TRACEROUTER is set as the ldisc for a master pty.

Unfortunately, the original lockdep report [1] does not contain
enough information to conclusively show a deadlock is possible.

However, the call to tty_driver_flush_buffer() is completely pointless
as this line discipline does not allow and never performs output to
this tty driver; remove.

[1] Email subject "tty: deadlock between n_tracerouter_receivebuf and
flush_to_ldisc" https://lkml.org/lkml/2015/12/30/71

Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Signed-off-by: Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>
---
drivers/tty/n_tracerouter.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index ac57169..a5fd45d 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -82,7 +82,6 @@ static int n_tracerouter_open(struct tty_struct *tty)
tr_data->opencalled = 1;
tty->disc_data = tr_data;
tty->receive_room = RECEIVE_ROOM;
- tty_driver_flush_buffer(tty);
retval = 0;
}
}
@@ -102,7 +101,6 @@ static void n_tracerouter_close(struct tty_struct *tty)

mutex_lock(&routelock);
WARN_ON(tptr->kref_tty != tr_data->kref_tty);
- tty_driver_flush_buffer(tty);
tty_kref_put(tr_data->kref_tty);
tr_data->kref_tty = NULL;
tr_data->opencalled = 0;
--
2.7.0