Re: [PATCH 4/9] dmaengine: dw-edma: HDMA: Add memory barrier before starting the DMA transfer in remote setup

From: Serge Semin
Date: Thu Jun 22 2023 - 12:22:36 EST


On Thu, Jun 22, 2023 at 05:12:03PM +0200, Köry Maincent wrote:
> On Wed, 21 Jun 2023 18:56:49 +0300
> Serge Semin <fancer.lancer@xxxxxxxxx> wrote:
>
> > > Thanks for you detailed answer, this was instructive.
> > > I will come back with more information if TLP flags are set.
> > > FYI the PCIe board I am currently working with is the one from Brainchip:
> > > Here is the driver:
> > > https://github.com/Brainchip-Inc/akida_dw_edma
> >
> > I've glanced at the driver a bit:
> >
> > 1. Nothing criminal I've noticed in the way the BARs are mapped. It's
> > done as it's normally done. pcim_iomap_regions() is supposed to map
> > with no additional optimization. So the caching seems irrelevant
> > in this case.
> >
> > 2. The probe() method performs some device iATU config:
> > akida_1500_setup_iatu() and akida_1000_setup_iatu(). I would have a
> > closer look at the way the inbound MWs setup is done.
> >
> > 3. akida_1000_iatu_conf_table contains comments about the APB bus. If
> > it's an internal device bus and both LPDDR and eDMA are accessible
> > over the same bus, then the re-ordering may happen there. If APB means
> > the well known Advanced Peripheral Bus, then it's a quite slow bus
> > with respect to the system interconnect and PCIe buses. If eDMA regs
> > and LL-memory buses are different then the last write to the LL-memory
> > might be indeed still pending while the doorbells update arrives.
> > Sending a dummy read to the LL-memory stalls the program execution
> > until a response arrive (PCIe MRd TLPs are non-posted - "send and wait
> > for response") which happens only after the last write to the
> > LL-memory finishes. That's probably why your fix with the dummy-read
> > works and why the delay you noticed is quite significant (4us).
> > Though it looks quite strange to put LPDDR on such slow bus.
> >
> > 4. I would have also had a closer look at the way the outbound MW is
> > configured in your PCIe host controller (whether it enables some
> > optimizations like Relaxed ordering and ID-based ordering).
> >
> > In anyway I would have got in touch with the FPGA designers whether
> > any of my suppositions correct (especially regarding 3.).
>

> Alright, thanks for your instructive review!
>
> In the HDMA driver point of view we can not know if the eDMA regs and the
> LL-memory will be in same bus in whatever future implementation. Of course it
> is the hardware designers who should be careful about having a fast bus and
> memory for the LL, but wouldn't it be more cautious to have this read?
> Just a small thought!

If we get assured that hardware with such problem exists (if you'll get
confirmation about the supposition 3. above) then we'll need to
activate your trick for that hardware only. Adding dummy reads for all
the remote eDMA setups doesn't look correct since it adds additional
delay to the execution path and especially seeing nobody has noticed
and reported such problem so far (for instance Gustavo didn't see the
problem on his device otherwise he would have fixed it).

So if assumption 3. is correct then I'd suggest the next
implementation: add a new dw_edma_chip_flags flag defined (a.k.a
DW_EDMA_SLOW_MEM), have it specified via the dw_edma_chip.flags field
in the Akida device probe() method and activate your trick only if
that flag is set.

-Serge(y)

>
> Köry