Re: Q: sock output serialization

From: Henner Eisen (eis@baty.hanse.de)
Date: Fri Sep 15 2000 - 17:13:22 EST


Hi,

>>>>> "David" == David S Miller <davem@redhat.com> writes:

    David> It smells rotten to the core, can someone tell me
    David> exactly why reordering is strictly disallowed? I do not
    David> even know how other OSes can handle this properly since
    David> most, if not all, use the IRQ dynamic cpu targeting
    David> facilities of various machines so LAPB is by definition
    David> broken there too.

LAPB itsself should be able to recover from reordering, although it is
not optimzed for this. It will just discard any received out-of-sequence
frame. The discarded frames will be retransmitted later (exacly like
frames which had been discarded due to CRC errors).

The problem is the X.25 packet layer (layer 3). It assumes that
the LAPB layer has already fixed any lost frames and out-of-sequence
problems and therefor does not provide for an own error recovery mechanism.
It will detect when frames are missing or out of sequence. But as it cannot
recover from such errors, it will just initiate a reset procedure
(discarding all currently queued frames, set the state machine to a
known state, and tell the network and the peer to also do so, before
data transmission resumes. The upper layer is notified about the reset
event, the task to recover from the packet loss is left to the upper layer.)

    David> I sense that usually, LAPB handles this issue at a
    David> different level, in the hardware? Does LAPB specify how to
    David> maintain reliably delivery and could we hook into this
    David> "how" when we need to drop LAPB frames? Perhaps it is too
    David> late by the time netif_rx is dealing with it.

The lapb protocol allows to flow control the peer. So, if known in advance
that netif_rx() would discard the frame, it could set its rx_busy condition.
(The linux software lapb module however does not support this, but this
problem is yet a different matter). From looking at the netif_rx() source,
it seems that CONFIG_NET_HW_FLOWCONTROL almost could provide the necessary
state information for flow controling the peer.

    David> LAPB sounds like quite a broken protocol at the moment...
    David> But I'm sure there are details which will emerge and clear
    David> this all up.

Well, not just at the moment, it has ever been like this. Thus, as we did
not panic before, there is neither reason to panic now.
Actually, its not the LAPB protocol itsself that is broken, but the
way of accessing it from the X.25 packet layer (reliable datalink
service is accessed via the unreliable dev_queue_xmit()/netif_rx()
interface). I always wondered why it was done like this. Probably
the possible problems were not realized during the early design stage
and did not show up when testing. (The problems might be unlikly to occur
in real-world scenarios. As real-world X.25 connections usually use only
slow links (a few kByte/sec), it is very unlikly that the X.25 connection
itsself caused the NET_RX queue to overrun. It might only be triggered
when the host is simultaneously flooded with other traffic from a local
high speed lan network interface. Triggering SMP packet reordering
problems with a slow X.25 link is probably even more unlikely).

For drivers using the software lapb module implementation, the right fix
would obviously be to move the lapb processing above the network interface.
(We will need to provide a function call interface between X.25 packet layer
and the datalink layer anyway once LLC.2 from the Linux-SNA project is
merged and should be supported by X.25 as well).
However, for drivers which support intelligent controllers (with lapb
in firmware) this is not an option and the problem will persist.

Henner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:26 EST