Re: Re[2]: Help needed on Serial driver issues

Theodore Y. Ts'o (tytso@MIT.EDU)
Thu, 22 Jan 1998 09:39:48 -0500


Date: Wed, 21 Jan 98 18:23:35 -0500
From: Shashi Ramamurthy <sramamurthy@equinox.com>

It is not a flow control problem, Flow control has been set to RTSCTS.
We are not losing data at the driver/board level for sure and I have
implemented the the throttle function for the line discipline to call
the throttle function in the event of "read_buf" getting to 128 or
below. The amount of space left in the tty->read_buf as calculated in
the driver input routine which is based on tty->read_head and
tty->read_cnt seems to be more than when it is calculated in the
n_tty_receive_buf routine, This is causing the loss of data and I know
this for sure as I am printing the value of count at the end of the
routine. This problem only starts happening when a lot of ports are
being used (on my 75Mhz box, about 48 ports or more). The cpu idle
time as reported by "top" becomes zero.

Err... why is your driver input routine trying to figure out how much
space the line discpline can take? First of all, if the line discpline
buffer is full, there's very little you can do except drop characters on
the floor, and secondly, if you're getting to the point where the line
discpline is full, that means that the flow control can't be working
correctly, since RTS should have been dropped a full 128 characters ago.

How often are you polling, and how are you implementing the flow
control? Are you perhaps ignoring the throttle message, and letting the
board drop RTS when its FIFO is full? You can do this, but it's rather
dangerous to get right; I don't recommend it.

In any case, trying to figure out how many characters the line discpline
can take by measuring read_buf and read_cnt is surely wrong, though,
since that doesn't take into account how many characters are in the flip
buffer. What I suspect is happening is that when the CPU gets really
busy, the kernel manages to skip the flush_to_ldisc for a particular
timer tick, and so it doesn't run. That means that your driver input
routine runs twice without the flush_to_ldisc running; then your driver
doesn't accurately estimate how many characters to send the line
discpline, since it didn't know about the characters still in the flip
buffer.

The big question, though is why you had enough characters in your
board's internal FIFO's such that you were in danger of running out of
buffer space in the first place? If the flow control was working
correctly, there should have been 128 bytes worth of grace to empty out
your board's buffers and to let the other side stop transmitting.
Irregardless of the bug in your driver in terms of trying to guess how
many characters the line discpline could take, there's no reason for
your driver to make that estimation in the first place, since if flow
control was working correctly, it should have never come to that anyway.

Finally --- a free hint. If you're writing a polling device driver,
there's no reason to use the flip buffers. The flip buffers are
designed to be used when a driver needs to minimize interrupt latency
(especiallyl when the board is generating an interrupt for every
character), at the cost of increasing the latency that it takes to
actually process incoming characters. However, for polling drivers,
that's not an issue, since the board has already buffered the characters
once already. So, you can just simply call the tty->ldisc.receive_buf
directly.

If you want an example of how to do this right, see the Rocketport
rocketport driver in the latest 2.1 kernel (drivers/char/rocket.c, in
rp_do_receive). You'll note that rp_do_receive doesn't bother checking
how many characters are left in the line discpline, since if that buffer
has filled, there's nothing you can do about it.

- Ted

P.S. What application is running on all of these ports? Are they
running PPP, or is it some kind of user-mode login processes, or UUCP,
or something else? If the CPU is down to 0%, the machine is obviously
under powered anyway, and the users are getting degraded service one way
or another. Granted dropping characters are bad, but if the CPU is
maxed out, overall performance isn't going to be good no matter what....