Re: serial8250: bogus low_latency destabilizes kernel, need sanitycheck

From: Peter Hurley
Date: Tue Feb 04 2014 - 07:42:59 EST


On 02/03/2014 06:10 AM, One Thousand Gnomes wrote:
On Sat, 01 Feb 2014 10:09:03 -0500
Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:

On 01/14/2014 11:24 AM, Pavel Roskin wrote:
Hi Alan,

Quoting One Thousand Gnomes <gnomes@xxxxxxxxxxxxxxxxxxx>:

Maybe we should unset the low_latency flag as soon as DMA fails? There
are two flags, one is state->uart_port->flags and the other is
port->low_latency. I guess we need to unset both.

Well low latency and DMA are pretty much exclusive in the real world so
probably DMA ports shouldn't allow low_latency to be set at all in DMA
mode.

That's a useful insight. I assumed exactly the opposite.

The meaning of low_latency has migrated since 2.6.28

Not really. The meaning of low latency was always "get the turn around
time for command/response protocols down as low as possible". DMA driven
serial usually reports a transfer completion on a watermark or a timeout,
so tends to work very badly within the Linux definition of 'low latency'
for tty.

What it does has certainly changed but thats implementation detail.

I meant the meaning as interpreted by the kernel, not the ideal meaning nor
its original intent.

Perhaps we should unconditionally unset low_latency (or remove it entirely).
Real low latency can be addressed by using the -RT kernel.

Just saying "use -RT" would be a regression and actually hurt quite a few
annoying "simple protocol" using tools for all sorts of control systems.
We are talking about milliseconds not microseconds here.

Ok, fair enough.

[Although my gut feeling is that nominal overhead is more like sub 10 usecs,
and only when the scheduler is I/O-bound does worst case get near 1 msec.]

The expected behaviour in low_latency is probably best described as

data arrives
processed
wakeup

low_latency cannot guarantee that data will be processed, only that
it will not wait.

Examples:
1) SLIP is changing the mtu size. In this case, data will be dropped
because, since the net queue is stopped, no data is taken up but any
data passed to the receive_buf() is assumed to have been consumed.
2) tty buffers are being flushed. There may or may not be any data to
process but there's no way to know without waiting.
3) termios is changing/has been changed. Depending on the line
discipline, data may or may not be processed until termios changes
complete.

etc.

and to avoid the case of

data arrives
queued for back end
[up to 10mS delay, but typically 1-2mS]
processed
wakeup


which multipled over a 50,000 S record download is a lot of time

Everything else is not user visible so can be changed freely to get that
assumption to work (including ending up not needing it in the first
place).

Getting tty to the point everything but N_TTY canonical mode is a fast
path would probably eliminate the need nicely - I don't know of any use
cases that expect ICANON, ECHO or I*/O* processing for low latency.

Easier said than done.

For example, what happens if termios is changing?
Presumably, data cannot be processed at that time. So the line discipline
returns early without having processed the data. [For example, the
receive_buf() path could use trylocks and abort, rather than waiting.]

But then, what restarts the attempt to process the data and can that wait?

Similarly for throttling. Unthrottling may be in progress; and even though
in progress, the condition that prompted the unthrottle may no longer be
true and throttling must be done. Ok, it can't happen right
now, but then when?

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/