Re: [PATCH] kdesu broken

From: Alan Cox
Date: Wed Jul 29 2009 - 04:59:32 EST


> > - The driver ensures that it will not call
> > tty_flip_buffer_push/flush_to_ldisc again for this port until re-opened
>
> That's just bogus.

I didn't invent it, thats how it works and its not an area I've touched.

> If that is wrong as per hangup code, then the bug is in the hangup
> handling, not in the tty_flush_to_ldisc().

I wouldn't argue with that - I merely pointed out they need synchronizing
>
> > - The driver calls tty_hangup
> > - tty_hangup ensures that tty_flip_buffer_push cannot occur again
> > (by killing the workqueue)
> > - resources may well then get freed before close()
>
> They had better not be, since all the data structures touched are inside
> the 'tty_struct' (which we're dereferencing in other ways anyway in that
> whole routine).

You are only looking at pty. That code is used for all the real physical
tty devices too. With real devices the underlying physical device and its
structures can get dumped. When you run the n_tty ldisc you call back out
to the drivers for echo etc.

> So the only thing that the hangup code needs to do is to make that the
> "tty->buf.work.work" function pointer is a nop. And it does, as far as I
> can tell.

What happens if the hangup occurs just after you start running the ldisc
on another CPU ?

> So regardless, by now we have moved from "trivial bug that bites people in
> real life" to "theoretical bug that looks impossible to trigger".

Actually all the hangup races turn up for people. Not often but now and
then. Also because you have vhangup() you can cause them in software by
leaving one app spinning in a loop hanging up and opening stuff while you
try and make it break.

> Well, put this way: the only thing that actually stops the outstanding
> timer (for the delayed work) is the tty_ldisc_halt() call in
> tty_ldisc_hangup(). If that _isn't_ called, then your argument is
> pointless, since the tty_flush_to_ldisc() will be done by a timer later
> (and Ogawa's patch thus clearly introduces nothing new).
>
> And when it _is_ called, it also clears TTY_LDISC, so now tty_ldisc_ref()
> will return NULL, so then flush_to_ldisc() will be a no-op.

IFF the hangup doesn't occur while you are entering flush_to_ldisc()

Consider a real tty for a bit

CPU0 CPU1
n_tty methods
flush_to_ldisc
get ldisc ref
INTERRUPT
tty_hangup
do_tty_hangup
ldisc work tty_ldisc_hangup
RESET_TERMIOS is false
tty->ops->hangup()
[usb]serial_hangup()
[usb]serial_do_down()
close physical
driver

tty->ops->write
[usb]serial_write
WARN()


So as I said before you need to fix flush_to_ldisc and the hangup running
against one another. At the very least I think you need a
tty_ldisc_wait_idle(tty); just before

if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {

so that you stall the hangup until n_tty exits the ldisc.


(The other case is calling ld->ops->hangup then calling ld->ops->write
which no ldisc seems to care about)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/