Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite

From: OGAWA Hirofumi
Date: Thu Sep 03 2009 - 07:30:58 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
>> On Tuesday 01 September 2009, Mikael Pettersson wrote:
>> >
>> > Starting with 2.6.31-rc8 and reverting
>> >
>> > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
>> > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
>> >
>> > in that order gives me a kernel that works on both x86 and powerpc64.
>> >
>> > So the bug is definitely limited to the pty buffering logic change.
>>
>> Thanks a lot for this information, adding somme CCs to the list.
>
> Mikael, is there any way to get the gcc testsuite to show the "expected"
> vs "result" cases when the failures occur, so that we can see what the
> pattern is ("it drops one character every 8kB" or something like that).
>
> However, I get the feeling that it's really the same bug that
> OGAWA-san already fixed - and that his fix just doesn't always do a 100%
> of the job.
>
> So what Ogawa did was to make sure that we flush any pending data whenever
> we;re checking "do we have any data left". He did that by calling out to
> tty_flush_to_ldisc(), which should flush the data through to the ldisc.
>
> The keyword here being "should". In flush_to_ldisc(), we have at least one
> case where we say "we'll delay it a bit more":
>
> if (!tty->receive_room) {
> schedule_delayed_work(&tty->buf.work, 1);
> break;
> }
>
> and while I think this _should_ be ok (because if there is no
> receive-room, then we'll hopefully always return non-zero from
> "input_available_p()". However, we do have this really odd case that the
> reader side will do "n_tty_set_room()" onlyl _after_ having checked for
> input_available_p(), and so maybe we do sometimes trigger the case that
>
> - input_available_p() tries to flush to the input buffer before checking
> how much data is available, by calling 'tty_flush_to_ldisc()'
>
> - but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
> is zero.
>
> - so now input_available_p will say "I don't have any data", even though
> there was data in the write buffers.
>
> - we'll notice that the other end has hung up, and return EOF/EIO.
>
> - which is very WRONG, because the other end may have hung up, but before
> it did that, it wrote data that is still in the write queues, and we
> should have returned that data.
>
> Anyway, I'm not at all sure that the "receive_room == 0" case can happen
> at all, but maybe it can. Ogawa-san?

If I'm not missing, I think it doesn't have big change with old
code. But I would need to check more deeply.

Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
why reverting buffer handling fixes the problem.

Well, anyway, I'd like to reproduce this on my machine. Could you tell
me the version of tools? I guess gcc testsuite using the gcc's source
(svn revision?), expect, dejagnu, tcl. (BTW, I'm using debian
testing. If it can be reproduced on kvm, I can install distro version
which you are using)

Thanks.
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/