[bisected] Re: tty lockdep trace

From: Mike Galbraith
Date: Sat Jun 03 2017 - 02:34:24 EST


On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> Just hit this during a trinity run.

925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
Author: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
Date: Thu May 11 12:18:52 2017 +0200

tty: fix port buffer locking

tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

The problem is that port->buf.tail->used is modified without consistent
locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
path takes ldata->output_lock.

We cannot simply take ldata->output_lock, since that is specific to the
N_TTY line discipline.

It might seem natural to try to take port->buf.lock inside
tty_insert_flip_string_fixed_flag() and friends (where port->buf is
actually used/modified), but this creates problems for flush_to_ldisc()
which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
and ldata->output_lock.

Therefore, the simplest solution for now seems to be to take
tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
is also used in the write path [3] with a consistent ordering.

[1]: Call Trace:
tty_insert_flip_string_fixed_flag
pty_write
tty_send_xchar // down_read(&o_tty->termios_rwsem)
// mutex_lock(&tty->atomic_write_lock)
n_tty_ioctl_helper
n_tty_ioctl
tty_ioctl // down_read(&tty->ldisc_sem)
do_vfs_ioctl
SyS_ioctl

[2]: Workqueue: events_unbound flush_to_ldisc
Call Trace:
tty_insert_flip_string_fixed_flag
pty_write
tty_put_char
__process_echoes
commit_echoes // mutex_lock(&ldata->output_lock)
n_tty_receive_buf_common
n_tty_receive_buf2
tty_ldisc_receive_buf // down_read(&o_tty->termios_rwsem)
tty_port_default_receive_buf // down_read(&tty->ldisc_sem)
flush_to_ldisc // mutex_lock(&port->buf.lock)
process_one_work

[3]: Call Trace:
tty_insert_flip_string_fixed_flag
pty_write
n_tty_write // mutex_lock(&ldata->output_lock)
// down_read(&tty->termios_rwsem)
do_tty_write (inline) // mutex_lock(&tty->atomic_write_lock)
tty_write // down_read(&tty->ldisc_sem)
__vfs_write
vfs_write
SyS_write

The bug can result in about a dozen different crashes depending on what
exactly gets corrupted when port->buf.tail->used points outside the
buffer.

The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
always welcome.

Found using syzkaller.

Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

:040000 040000 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 56e5e66a5e451bf55857025899f4a092b51dca6e M drivers

git bisect start
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12
# bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003
# good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 16a12fa9aed176444fc795b09e796be41902bb08
# good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac
# good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5
# good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly uncharge LRU pages
git bisect good 18365225f0440d09708ad9daade2ec11275c3df9
# good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support ctrl_loss_tmo
git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11
# good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db
# bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12
# bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang (regression) on MIPS/OpenRISC
git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08
# good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 6f68a6ae1f0ea2fd87e26dc345866e928b5850a8
# bad: [da4c279942b05727088774df224c0734688b4cbc] serial: enable serdev support
git bisect bad da4c279942b05727088774df224c0734688b4cbc
# good: [5c9d6abed9e0a061de252a53ab687a1171502e81] serial: altera_jtaguart: adding iounmap()
git bisect good 5c9d6abed9e0a061de252a53ab687a1171502e81
# bad: [925bb1ce47f429f69aad35876df7ecd8c53deb7e] tty: fix port buffer locking
git bisect bad 925bb1ce47f429f69aad35876df7ecd8c53deb7e
# good: [11d4d32158eeaf36fe1073a4a260193d7a19ccf1] tty: ehv_bytechan: clean up init error handling
git bisect good 11d4d32158eeaf36fe1073a4a260193d7a19ccf1
# first bad commit: [925bb1ce47f429f69aad35876df7ecd8c53deb7e] tty: fix port buffer locking

[ 56.727594] ======================================================
[ 56.728482] WARNING: possible circular locking dependency detected
[ 56.729318] 4.12.0-master #108 Tainted: G E
[ 56.730125] ------------------------------------------------------
[ 56.731024] kworker/u8:3/329 is trying to acquire lock:
[ 56.731774] (&buf->lock){+.+...}, at: [<ffffffff81423e97>] tty_buffer_flush+0x37/0xa0
[ 56.732773]
but task is already holding lock:
[ 56.733444] (&o_tty->termios_rwsem/1){++++..}, at: [<ffffffff8141fb91>] isig+0x41/0xf0
[ 56.734303]
which lock already depends on the new lock.

[ 56.734868]
the existing dependency chain (in reverse order) is:
[ 56.735377]
-> #2 (&o_tty->termios_rwsem/1){++++..}:
[ 56.735799] lock_acquire+0xbd/0x200
[ 56.736071] down_read+0x4c/0x70
[ 56.736323] n_tty_write+0x90/0x450
[ 56.736584] tty_write+0x159/0x2e0
[ 56.736844] __vfs_write+0x28/0x140
[ 56.737119] vfs_write+0xc4/0x1c0
[ 56.737376] redirected_tty_write+0x5c/0xa0
[ 56.737681] __do_readv_writev+0x2b4/0x350
[ 56.737980] do_readv_writev+0x63/0x80
[ 56.738263] vfs_writev+0x3c/0x50
[ 56.738513] do_writev+0x55/0xd0
[ 56.738758] SyS_writev+0x10/0x20
[ 56.739009] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 56.739345]
-> #1 (&tty->atomic_write_lock){+.+.+.}:
[ 56.739760] lock_acquire+0xbd/0x200
[ 56.740026] __mutex_lock+0x75/0x8e0
[ 56.740297] mutex_lock_nested+0x1b/0x20
[ 56.740585] tty_port_default_receive_buf+0x48/0x90
[ 56.740932] flush_to_ldisc+0x7f/0xa0
[ 56.741233] process_one_work+0x1c5/0x5c0
[ 56.741528] worker_thread+0x4d/0x3c0
[ 56.741800] kthread+0x114/0x150
[ 56.742046] ret_from_fork+0x2a/0x40
[ 56.742318]
-> #0 (&buf->lock){+.+...}:
[ 56.742664] __lock_acquire+0x13e1/0x1400
[ 56.742957] lock_acquire+0xbd/0x200
[ 56.743228] __mutex_lock+0x75/0x8e0
[ 56.743496] mutex_lock_nested+0x1b/0x20
[ 56.743784] tty_buffer_flush+0x37/0xa0
[ 56.744068] pty_flush_buffer+0x27/0x80
[ 56.744462] tty_driver_flush_buffer+0x1b/0x20
[ 56.744776] isig+0x95/0xf0
[ 56.744999] n_tty_receive_signal_char+0x1c/0x70
[ 56.745338] n_tty_receive_char_special+0x863/0xb00
[ 56.745660] n_tty_receive_buf_common+0x7ae/0xac0
[ 56.746013] n_tty_receive_buf2+0x14/0x20
[ 56.746311] tty_ldisc_receive_buf+0x23/0x50
[ 56.746598] tty_port_default_receive_buf+0x59/0x90
[ 56.746960] flush_to_ldisc+0x7f/0xa0
[ 56.747237] process_one_work+0x1c5/0x5c0
[ 56.747509] worker_thread+0x4d/0x3c0
[ 56.747761] kthread+0x114/0x150
[ 56.748016] ret_from_fork+0x2a/0x40
[ 56.748300]
other info that might help us debug this:

[ 56.748835] Chain exists of:
&buf->lock --> &tty->atomic_write_lock --> &o_tty->termios_rwsem/1

[ 56.749581] Possible unsafe locking scenario:

[ 56.749994] CPU0 CPU1
[ 56.750294] ---- ----
[ 56.750564] lock(&o_tty->termios_rwsem/1);
[ 56.750855] lock(&tty->atomic_write_lock);
[ 56.751314] lock(&o_tty->termios_rwsem/1);
[ 56.751808] lock(&buf->lock);
[ 56.752021]
*** DEADLOCK ***

[ 56.752380] 6 locks held by kworker/u8:3/329:
[ 56.752641] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8108861f>] process_one_work+0x13f/0x5c0
[ 56.753203] #1: ((&buf->work)){+.+...}, at: [<ffffffff8108861f>] process_one_work+0x13f/0x5c0
[ 56.753717] #2: (&port->buf.lock/1){+.+...}, at: [<ffffffff81423935>] flush_to_ldisc+0x25/0xa0
[ 56.754243] #3: (&tty->ldisc_sem){++++.+}, at: [<ffffffff81422ccf>] tty_ldisc_ref+0x1f/0x60
[ 56.754748] #4: (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff814242b8>] tty_port_default_receive_buf+0x48/0x90
[ 56.755373] #5: (&o_tty->termios_rwsem/1){++++..}, at: [<ffffffff8141fb91>] isig+0x41/0xf0
[ 56.755875]
stack backtrace:
[ 56.756139] CPU: 2 PID: 329 Comm: kworker/u8:3 Tainted: G E 4.12.0-master #108
[ 56.756637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 56.757327] Workqueue: events_unbound flush_to_ldisc
[ 56.757629] Call Trace:
[ 56.757830] dump_stack+0x85/0xc9
[ 56.758048] print_circular_bug+0x1f9/0x207
[ 56.758315] __lock_acquire+0x13e1/0x1400
[ 56.758564] ? check_irq_usage+0x83/0xb0
[ 56.758882] lock_acquire+0xbd/0x200
[ 56.759113] ? tty_buffer_flush+0x37/0xa0
[ 56.759463] ? tty_buffer_flush+0x37/0xa0
[ 56.759737] __mutex_lock+0x75/0x8e0
[ 56.759998] ? tty_buffer_flush+0x37/0xa0
[ 56.760250] ? isig+0x5e/0xf0
[ 56.760433] ? tty_buffer_flush+0x37/0xa0
[ 56.760677] ? find_held_lock+0x35/0xa0
[ 56.760957] ? isig+0x8d/0xf0
[ 56.761140] mutex_lock_nested+0x1b/0x20
[ 56.761383] tty_buffer_flush+0x37/0xa0
[ 56.761617] pty_flush_buffer+0x27/0x80
[ 56.761870] tty_driver_flush_buffer+0x1b/0x20
[ 56.762162] isig+0x95/0xf0
[ 56.762340] n_tty_receive_signal_char+0x1c/0x70
[ 56.762619] n_tty_receive_char_special+0x863/0xb00
[ 56.762960] n_tty_receive_buf_common+0x7ae/0xac0
[ 56.763250] ? tty_ldisc_ref+0x1f/0x60
[ 56.763478] n_tty_receive_buf2+0x14/0x20
[ 56.763721] tty_ldisc_receive_buf+0x23/0x50
[ 56.763983] ? mutex_lock_nested+0x1b/0x20
[ 56.764238] tty_port_default_receive_buf+0x59/0x90
[ 56.764533] flush_to_ldisc+0x7f/0xa0
[ 56.764756] process_one_work+0x1c5/0x5c0
[ 56.765075] ? process_one_work+0x13f/0x5c0
[ 56.765366] worker_thread+0x4d/0x3c0
[ 56.765590] kthread+0x114/0x150
[ 56.765787] ? process_one_work+0x5c0/0x5c0
[ 56.766094] ? kthread_create_on_node+0x40/0x40
[ 56.766374] ret_from_fork+0x2a/0x40