Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

From: Russell King (rmk@arm.linux.org.uk)
Date: Wed Feb 05 2003 - 09:51:26 EST


Ok, what seems to be going on is as follows:

1. you kill 'dd' as per the bug report
2. the kernel closes the file descriptors that 'dd' had open as part of
   the do_exit() cleanup.
3. this calls tty_release(), which takes the BKL, and the tty layer
   does its necessary cleanup, setting filp->private_data to NULL.
   *note* the file descriptor is still on the tty->tty_files list.
4. tty_release() releases the BKL, at which point preemption occurs,
   switching to the master sshd process.
5. sshd closes the master end of the pty device, which causes
   tty_vhangup to be called.
6. do_tty_hangup scans the file descriptors attached to tty->tty_files,
   and calls tty_fasync for each.
7. we come across the file descriptor that 'dd' had open (since it is
   still on the tty->tty_files list).
8. tty_fasync looks at filp->private_data, and tty_paranoia_check finds
   it to be NULL, and so complains.

So, where do we take the file descriptor off the tty list? __fput() -
list_del(&file->f_list);

This problem could still be present on non-preempt configurations,
although to a lesser degree. As the code currently stands in both
2.4 and 2.5, we generally do not see any reschedules between tty_release
and __fput which would allow the above scenario to occur. Preempt
just provides the extra bait for the bug to show itself.

As to the fix, umm, yea. I'll get to that in a little while. The good
news is that we know what's happening now.

(note: the "scheduling while atomic" below is caused by the method I
used to discover this problem, and shows the point at which the pesky
reschedule happened in the tty code. pid135 is the dd process on the
pty slave end, pid130 is the sshd on the pty master end.)

tty_release: (pid=135) filp = c131ecb4 filp->private_data = c12d3000
bad: scheduling while atomic!
[<c0227d10>] (dump_stack+0x0/0x14)
[<c023cde8>] (schedule+0x0/0x3b4)
[<c023d19c>] (preempt_schedule+0x0/0x50)
[<c02f17b8>] (tty_release+0x0/0x110)
[<c0276230>] (__fput+0x0/0x1a8)
[<c02761f8>] (fput+0x0/0x38)
[<c0274864>] (filp_close+0x0/0x110)
[<c0242b0c>] (put_files_struct+0x0/0x110)
[<c024357c>] (do_exit+0x0/0x528)
[<c024a724>] (sig_exit+0x0/0x74)
[<c02268f8>] (do_signal+0x0/0x470)
[<c0226d68>] (do_notify_resume+0x0/0x34)
tty_release: (pid=130) filp = c131e984 filp->private_data = c1264000
do_tty_hangup: (pid=130) filp = c131ecb4, filp->private_data = 00000000
Warning: null TTY for (88:01) in tty_fasync
tty_release: (pid=130) done
tty_release: (pid=135) done

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 07 2003 - 22:00:17 EST