Re: Hardware spec prevents optimal performance in device driver

From: Mason
Date: Sun May 10 2015 - 12:46:34 EST

Next message: Joe Perches: "Re: [PATCH RFC V2] checkpatch: flag split arithmetic operations with CHECK"
Previous message: Ryusuke Konishi: "Re: [PATCH 2/3] NILFS2: support NFSv2 export"
In reply to: Måns Rullgård: "Re: Hardware spec prevents optimal performance in device driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/05/2015 12:29, Måns Rullgård wrote:

> Mason writes:
>
>> One Thousand Gnomes wrote:
>>
>>> Mason wrote:
>>>
>>>> I'm writing a device driver for a serial-ish kind of device.
>>>> I'm interested in the TX side of the problem. (I'm working on
>>>> an ARM Cortex A9 system by the way.)
>>>>
>>>> There's a 16-byte TX FIFO. Data is queued to the FIFO by writing
>>>> {1,2,4} bytes to a TX{8,16,32} memory-mapped register.
>>>> Reading the TX_DEPTH register returns the current queue depth.
>>>>
>>>> The TX_READY IRQ is asserted when (and only when) TX_DEPTH
>>>> transitions from 1 to 0.
>>>
>>> If the last statement is correct then your performance is probably always
>>> going to suck unless there is additional invisible queueing beyond the
>>> visible FIFO.
>>
>> Do you agree with my assessment that the current semantics for
>> TX_READY lead to a race condition, unless we limit ourselves
>> to a single (atomic) write between interrupts?
>
> No. To get best throughput, you can simply busy-wait until TX_DEPTH
> indicates the FIFO is almost empty, then write a few words, but no more
> than you know fit in the FIFO. Repeat until all data has been written.
> Use the IRQ only to signal completion of the entire packet.

Would you fill the FIFO with TX_READY disabled?
or with all interrupts masked?

I will show with pseudo-code where (I think) the race condition
breaks the algorithm you suggest. (When using IRQs, not busy wait.)

> If the transmit rate is low, you can save some CPU time by filling the
> FIFO, then sleeping until it should be almost empty, fill again, etc.

For one data point, the test app I have sets the tx rate to 128 kbps.
Thus, 1 ms to transmit an entire queue. CPU runs at 100-1000 MHz
depending on the mood of cpufreq.

> Whether busy-waiting or sleeping, this approach keeps the data flowing
> as fast as possible.
>
> With the hardware you describe, there is unfortunately a trade-off
> between throughput and CPU efficiency. You'll have to decide which is
> more important to you.

I can ask the hardware designer to change the behavior for the next
iteration of the SoC.

Regards.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Joe Perches: "Re: [PATCH RFC V2] checkpatch: flag split arithmetic operations with CHECK"
Previous message: Ryusuke Konishi: "Re: [PATCH 2/3] NILFS2: support NFSv2 export"
In reply to: Måns Rullgård: "Re: Hardware spec prevents optimal performance in device driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]