Re: serial driver / tty issues

From: Benjamin Herrenschmidt
Date: Fri Mar 05 2004 - 06:15:11 EST


On Fri, 2004-03-05 at 20:41, Russell King wrote:
> On Fri, Mar 05, 2004 at 08:18:56PM +1100, Benjamin Herrenschmidt wrote:
> > - most/all serial drivers, when the flip buffer is full, will
> > call tty->flip.work.func() directly with the spinlock held. This is
> > asking for trouble. I have reproduceable cases where that cause the
> > tty layer to try to echo, thus calling back the serial_core
> > uart_put_char() which will try to ... take the spinlock. Dead.
>
> Yep - I think we need to drop the spinlock, but by dropping it we need
> to check that stuff is still sane after re-acquiring it.

Yup. I hacked something checking my driver was still open, not
sure it's robust enough at this point though.

> > - what about the call to tty_flip_buffer_push() done by all
> > drivers with the lock held too ? It's fine as long as we don't
> > have this low_latency thing set. I suppose nothing but the driver
> > itself will set it but I got a bit lost in the serial_core, can
> > you just confirm that is ok ?
>
> Again, we should drop the spinlock and re-acquire it afterwards.

For that one, I don't need to drop it. I'm just returning from
the receive_chars to the irq indicating the main irq routine
that something happened. It then calls tty_flip_buffer_push()
after releasing the lock.

> > - I had a couple of times a crash in n_tty_receive_buf() called
> > from keventd (from ldisc flip workqueue), apparently racing with
> > a close of the port. The scenario is that the close happens, i
> > get out of my driver back to serial core which goes back to
> > tty_release afaik. At that point (I'm not sure exactly when, maybe
> > in the flush of the pending work queues that is done there, maybe
> > just on the other CPU), the pending work queue is triggered since
> > our input buffer is still full of crap.
> > It reliably oopses trying to derefence 0 (writing a byte, it's not
> > a memcpy, without a spinlocked region, I haven't spotted exactly
> > where in n_tty_receive_buf(), this function is shit to disassemble
> > as it seems to get a ton of things inlined).
>
> Well, there does seem to be a race in there in the tty layer. We
> appear to close down the ldisc, and fiddle about with some other
> things, and eventually cancel the work queue.
>
> The n_tty close method frees the tty->read_buf, and n_tty_receive_buf()
> references said buffer. If the timing is right, *boom*.

Yup.

> We should cancel the work queue earlier, so we can guarantee that we
> won't call the ldisc functions after we've closed them down.

Yup.

> I guess the TTY layer still needs a complete top to bottom overhaul...

I was afraid you would say that ... :)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/