Re: n_tty: Check the other end of pty pair before returning EAGAIN on a read()

From: Marc Aurele La France
Date: Thu Dec 10 2015 - 17:49:06 EST


On Thu, 10 Dec 2015, Peter Hurley wrote:
> On 12/09/2015 01:06 PM, Marc Aurele La France wrote:
>> The following four commits are some of the changes that have been made
>> to the tty layer since kernel version 3.11:

>> 1) f95499c3030fe1bfad57745f2db1959c5b43dca8
>> n_tty: Don't wait for buffer work in read() loop

>> 2) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
>> tty: Fix pty master read() after slave closes

>> 3) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
>> pty, n_tty: Simplify input processing on final close

>> 4) 1a48632ffed61352a7810ce089dc5a8bcd505a60
>> pty: Fix input race when closing

>> Indeed, the previous code (before commit "1)") checked the other end
>> of the pty pair for any data before returning EAGAIN. This mimics the
>> behaviour of other System-V variants (Solaris, AIX, etc.)

> What other SysV systems were tested?

The fix for mindrot bug 52 was verified to correct the issue reported
there on various versions of SunOS, OSF1, HP-UX, IRIX, Tru64 UNIX, AIX,
OpenSolaris, and a number of Linux distributions. There was a no-go
report regarding HP-UX 11.11 that was never followed up on, but I
believe it to have been resolved by the fix for mindrot bug 1467 (for
systems where EAGAIN != EWOULDBLOCK).

>> in this
>> regard and ensured that EAGAIN really did mean no data was available
>> at the time of the call.

>> After sshd has been SIGCHLD'ed about the shell's termination, it
>> continues to read the master pty until an error occurs. This error
>> will be EIO if no process has the slave pty open. Otherwise (for
>> example when the shell spawned long-running processes in the
>> background before terminating), that error is expected to be EAGAIN.
>> sshd cannot continue to read until an EIO in all cases, because doing
>> so causes the session to hang until all processes have closed the
>> slave pty, which is not the desired behaviour. Thus a spurious EAGAIN
>> return causes sshd to lose data, whether or not the slave pty is
>> completely closed.

> Ah, the games userspace will be up to :)

Not really. The fact different OSes behave differently in this regard can
hardly be said to be userland's fault. The lower the number of distinct
behaviours userland needs to deal with, the better. Furthermore, sshd
"knows" there should be data there, so it makes no sense to befuddle it
with false EAGAIN returns.

>> --- a/drivers/tty/n_tty.c
>> +++ a/drivers/tty/n_tty.c
>> @@ -2281,20 +2281,26 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
>> if (!timeout)
>> break;
>> if (file->f_flags & O_NONBLOCK) {
>> - retval = -EAGAIN;
>> - break;
>> - }
>> - if (signal_pending(current)) {
>> - retval = -ERESTARTSYS;
>> - break;
>> - }
>> - up_read(&tty->termios_rwsem);
>> + up_read(&tty->termios_rwsem);
>> + flush_work(&tty->port->buf.work);
>> + down_read(&tty->termios_rwsem);
>> + if (!input_available_p(tty, 0)) {
>> + retval = -EAGAIN;
>> + break;
>> + }
>> + } else {
>> + if (signal_pending(current)) {
>> + retval = -ERESTARTSYS;
>> + break;
>> + }
>> + up_read(&tty->termios_rwsem);

> No sense in doing this just for O_NONBLOCK; might as well do it before
> all the condition tests.

> Which renders the earlier fixes for the slave end closing superfluous,
> so might as well rip those out.

This strikes me as somewhat draconian. It seems to me the intent behind
commit "1)" was in part to speed things up a bit. I beleive that goal to
have been largely attained. It's just a matter of ensuring this speedup
doesn't change kernel<->userland semantics. I see your EIO fix and the
eventual solution to the problem here in that light.

> n_tty_poll() will need to be fixed as well, because if one application
> used read() with O_NONBLOCK to expect to block until i/o became available,
> then I guarantee some other application uses poll() with no timeout
> for the same purpose.

Agreed. which is another reason why I don't believe my suggestion to be
the correct fix. Also, I took somewhat of a hammer approach, meaning
that, for OpenSSH's purposes, it would be sufficient to remove spurious
EAGAIN only after disassociation of the slave pty.

>>
>> - timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
>> - timeout);
>> + timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
>> + timeout);
>>
>> - down_read(&tty->termios_rwsem);
>> - continue;
>> + down_read(&tty->termios_rwsem);
>> + continue;
>> + }
>> }
>>
>> if (ldata->icanon && !L_EXTPROC(tty)) {
>>

Thanks.

Marc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/