Re: Characters vanishing in the tty layer? (maybe related to [Bug#14388] keyboard under X with 2.6.31)

From: Michael Ellerman
Date: Tue Oct 20 2009 - 01:31:35 EST


On Tue, 2009-10-20 at 03:09 +0900, Linus Torvalds wrote:
>
> On Mon, 19 Oct 2009, Michael Ellerman wrote:
> >
> > As Benh said it's not really bisectable on our kernel. But I got Mikey
> > to bisect it on upstream using a different simulator model, and he
> > couldn't tie it down. It becomes easier to hit in more recent kernels
> > (since 27), but he could hit in 25 too.
>
> Ok, thanks to the verification.
>
> And I think I see why it got easier to hit lately, and to some degree I
> think we can at least partially avoid it:
>
> > * hvc_console reads all our input and passes it to the tty code
> > via tty_insert_flip_char()
> > * flush_to_ldisc() runs calling n_tty_receive_buf(), which fills
> > 4K of tty->read_buf
> > * Once read_buf is full, tty->receive_room becomes 0 and
> > flush_to_ldisc() reschedules itself to run again in 1 jiffy.
> > * Bash reads 1 character, causing receive_room to become 1.
> > * flush_to_ldisc() runs again and inserts 1 more char because
> > receive_room is now 1.
>
> .. ok, I agree that our behavior in the "buffer full" case is likely not
> wonderful. And that's especially true in the 'icanon' case..
>
> > * (repeat the last two steps a few times)
> > * Bash sets tty->icanon = 1 via n_tty_set_termios(), which calls
> > n_tty_set_room(). Because icanon is enabled, n_tty_set_room()
> > lies and says we have space for 1 character even though we
> > don't.
>
> So just to clarify - icanon wasn't set before?

It wasn't set before.

> > * flush_to_ldisc() runs, sees that receive_room is 1 and calls
> > n_tty_receive_buf()
> > * n_tty_receive_buf() calls n_tty_receive_char() which drops the
> > character because there's no room (~ line 1132).
> > * We keep dropping characters until we see a newline, which
> > increments tty->canon_data, causing n_tty_set_room() to report 0
> > space left, and so flush_to_ldisc() reschedules again.
>
> Did you have newlines in the big bulk dump?

Yes. It's not really a bulk dump, it's just a bit over 4K of shell
script that we're writing into the console to run a test. So lots of
newlines.

> If there really aren't any newlines, I don't think we can do a lot. icanon
> handling kind of fundamentally doesn't work if there is no newline at all,
> since it is all about line buffering, and we obviously have to limit the
> lines _somewhere_.

Right that makes sense. But shouldn't be the case here.

> But I also thihk that we only update that whole 'canon_data' thing if we
> _received_ the newlines while we were in icanon mode (but I didn't really
> check very closely), so if we actually switch from -icanon to +icanon, I
> think canon_data can be confused, and we thus handle the buffer-full case
> worse than we _could_ have.

Yeah I think that's right. n_tty_set_termios() always sets canon_data to
0 when there is a canon change, in either direction. And I don't see
anything else will fix it.

> > It's a bit unfortunate that n_tty_set_room() lies about the available
> > space when icanon = 1, but it makes sense in order to handle erase. It
> > would be nice if n_tty_receive_buf() returned a status to
> > flush_to_disc() to say "actually I could only fit X chars after all,
> > please take them back" - but I don't grok how that would play with all
> > the other logic in there.
>
> Yeah, I don't think that is even worth it. The thing is, we do have to
> start dropping characters at some point, so trying to extend the non-drop
> case just moves it somewhere else. If you are in canon mode, and the line
> is longer than <some-number-that-happens-to-be-4kB>, you're basically
> screwed.

Agreed.

> But what we _might_ do is to handle the "turn on canon mode" case better,
> in case the old buffer had newlines. IOW, instead of always setting
> 'canon_data' to 0 when icanon changes (and setting canon_head to the tail
> of the data), maybe we should simply _count_ the number of newlines (and
> set canon_head to the last one in the buffer).
>
> IOW, if you do have newlines in your bulk data before icanon got set, we
> could probably make n_tty handle that case better.

Yeah I think that is the root cause of my bug. There are plenty of lines
in my buffer, so n_tty_set_room() should not be accepting more data.

> That said, as far as I know, the tty layer behavior in this area has
> basically always been the way it is. The fact that you can see it more
> easily is almost certainly just related to just how the buffers that lead
> up to flush_to_ldisc have grown, and are now allowed to fill up further.
> So it probably got much easier to trigger, but it is likely not in any way
> a really new case, and goes back not just to 2.6.25, but probably
> forever...

Yeah, we only stopped at 25 because our toolchain was too new :}

> I wonder a bit what in your particular environment makes this problem show
> up, but I assume that your simulation environment ends up just blindly
> stuffing the tty buffers through the virtual console, so you basically
> have "simulated typing" that was (a) started long before the reader was
> interested in accepting it and (b) was a huge dump rather than what any
> real load would be.

Yes and no. (a) doesn't seem to apply, no amount of delay before
starting the input makes a difference. And (b) only sort of. I don't
think a bit more than 4K is a ridiculous amount to stuff down a console.
Though I guess we're "typing" at a ridiculous speed.

> But if you change n_tty_set_termios() to count newlines when it sets
> icanon, you might get the behaviour you want, and it would seem to me to
> be an improvement over what we have now, so I wouldn't object to it, even
> if I suspect nobody else has ever cared, and would ever care in the
> future.

Clearly no one has ever cared before :) I'll have a look if I can come
up with a neat fix.

cheers

Attachment: signature.asc
Description: This is a digitally signed message part