Re: (EXT) Re: [PATCH v9 RESEND 01/13] spi: imx: add dma_sync_sg_for_device after fallback from dma

From: Matthias Schiffer
Date: Tue Jun 09 2020 - 06:09:22 EST


On Tue, 2020-06-09 at 11:00 +0100, Robin Murphy wrote:
> On 2020-06-09 06:21, Robin Gong wrote:
> > On 2020/06/09 0:44 Robin Murphy <robin.murphy@xxxxxxx> wrote:
> > > On 2020-06-08 16:31, Mark Brown wrote:
> > > > On Mon, Jun 08, 2020 at 03:08:45PM +0000, Robin Gong wrote:
> > > >
> > > > > > > + if (transfer->rx_sg.sgl) {
> > > > > > > + struct device *rx_dev = spi->controller-
> > > > > > > >dma_rx->device->dev;
> > > > > > > +
> > > > > > > + dma_sync_sg_for_device(rx_dev, transfer-
> > > > > > > >rx_sg.sgl,
> > > > > > > + transfer->rx_sg.nents,
> > > > > > > DMA_TO_DEVICE);
> > > > > > > + }
> > > > > > > +
> > > > > > This is confusing - why are we DMA mapping to the device
> > > > > > after doing
> > > > > > a PIO transfer?
> > > > > 'transfer->rx_sg.sgl' condition check that's the case
> > > > > fallback PIO
> > > > > after DMA transfer failed. But the spi core still think the
> > > > > buffer
> > > > > should be in 'device' while spi driver touch it by PIO(CPU),
> > > > > so sync it back to
> > >
> > > device to ensure all received data flush to DDR.
> > > >
> > > > So we sync it back to the device so that we can then do another
> > > > sync
> > > > to CPU? TBH I'm a bit surprised that there's a requirement
> > > > that we
> > > > explicitly undo a sync and that a redundant double sync in the
> > > > same
> > > > direction might be an issue but I've not had a need to care so
> > > > I'm
> > > > perfectly prepared to believe there is.
> > > >
> > > > At the very least this needs a comment.
> > >
> > > Yeah, something's off here - at the very least, syncing with
> > > DMA_TO_DEVICE on
> > > the Rx buffer that was mapped with DMA_FROM_DEVICE is clearly
> > > wrong.
> > > CONFIG_DMA_API_DEBUG should scream about that.
> > >
> > > If the device has written to the buffer at all since dma_map_sg()
> > > was called
> > > then you do need a dma_sync_sg_for_cpu() call before touching it
> > > from a CPU
> > > fallback path, but if nobody's going to touch it from that point
> > > until it's
> > > unmapped then there's no point syncing it again. The
> > > my_card_interrupt_handler() example in DMA-API_HOWTO.txt
> > > demonstrates
> > > this.
> >
> > Thanks for you post, but sorry, that's not spi-imx case now,
> > because the rx data in device memory is not truly updated from
> > 'device'/DMA, but from PIO, so that dma_sync_sg_for_cpu with
> > DMA_FROM_DEVICE can't be used, otherwise the fresh data in cache
> > will be invalidated.
> > But you're right, kernel warning comes out if CONFIG_DMA_API_DEBUG
> > enabled...
>
> Ah, I think I understand what's going on now. That's... really ugly
> :(
>
> Looking at the SPI core code, I think a better way to handle this
> would
> be to have your fallback path call spi_unmap_buf() directly (or
> perform
> the same actions, if exporting that to drivers is unacceptable),
> then
> make sure ->can_dma() returns false after that such that
> spi_unmap_msg()
> won't try to unmap it again. That's a lot more reasonable than trying
> to
> fake up a DMA_TO_DEVICE transfer in the middle of a DMA_FROM_DEVICE
> operation on the same buffer.
>
> Alternatively, is it feasible to initiate a dummy DMA request during
> probe, such that you can detect the failure condition and give up on
> the
> DMA channel early, and not have to deal with it during a real SPI
> transfer?
>
> Robin.


Would this cover the transient DMA failure that is happening between
SDMA registration and firmware load? This is exactly the case for which
the PIO fallback is triggered for us: As soon as the SDMA driver is
registered, the SPI driver can be probed as well, usually failing its
first DMA transfer, as the SDMA firmware is not loaded yet. We would
still like the SPI controller to use DMA as soon as it's actually
available.

I assume the actual issue is that the SDMA controller is considered
registered before the firmware load has finished, but I have no idea
how feasible it would be to change that (some comments in the code
explain why this currently isn't the case).

Matthias