Re: [PATCH v6 1/1] dmaengine: switchtec-dma: Introduce Switchtec DMA engine PCI driver

From: Kelvin.Cao
Date: Fri Oct 06 2023 - 18:34:47 EST


On Fri, 2023-10-06 at 16:00 +0530, Vinod Koul wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
>
> On 05-10-23, 18:35, Kelvin.Cao@xxxxxxxxxxxxx wrote:
>
> > > > > +static struct dma_async_tx_descriptor *
> > > > > +switchtec_dma_prep_wimm_data(struct dma_chan *c, dma_addr_t
> > > > > dma_dst, u64 data,
> > > > > +                          unsigned long flags)
> > > >
> > > > can you please explain what this wimm data refers to...
> > > >
> > > > I think adding imm callback was a mistake, we need a better
> > > > justification for another user for this, who programs this,
> > > > what
> > > > gets
> > > > programmed here
> > >
> > > Sure. I think it's an alternative method to prep_mem and would be
> > > more
> > > convenient to use when the write is 8-byte and the data to be
> > > moved
> > > is
> > > not in a DMA mapped memory location. For example, we write to a
> > > doorbell register with the value from a local variable which is
> > > not
> > > associated with a DMA address to notify the receiver to consume
> > > the
> > > data, after confirming that the previously initiated DMA
> > > transactions
> > > of the data have completed. I agree that the use scenario would
> > > be
> > > very
> > > limited.
>
> Can you please explain more about this 'value' where is it derived
> from?
> Who programs it and how...

Sure. Think of a producer/consumer use case where the producer is a
host DMA client driver and the consumer is a PCIe EP. On the EP, there
is a memory-mapped data buffer for data receiving and a memory-mapped
doorbell buffer for triggering data consuming. Each time for a bulk
data transfer, the DMA client driver first DMA the data of size X to
the memory-mapped data buffer, then it DMA the value X to the doorbell
buffer to trigger data consumption on device. On receiving a doorbell
writing, the device starts to consume the data of size X from the data
buffer.

For the first DMA operation, the DMA client would use dma_prep_memory()
like what most DMA clients do. However, for the second transfer, value
X is held in a local variable like below.

u64 size_to_transfer;

In this case, the client would use dma_prep_wimm_data() to DMA value X
to the doorbell buffer, like below.

dma_prep_wimm_data(chan, dst_for_db_buffer, size_to_transfer, flag);

Hope this example explains the thing. People would argue that the
client could use the same dma_prep_memory() for the doorbell ringing. I
would agree, this API just adds an alternative way to do so when the
data is as little as 64 bit and it also saves a call site to
dma_alloc_coherent() to allocate a source DMA buffer just for holding
the 8-byte value, which is required by dma_prep_memcpy().


> > > > > +     /* set sq/cq */
> > > > > +     writel(lower_32_bits(swdma_chan->dma_addr_sq),
> > > > > &chan_fw-
> > > > > > sq_base_lo);
> > > > > +     writel(upper_32_bits(swdma_chan->dma_addr_sq),
> > > > > &chan_fw-
> > > > > > sq_base_hi);
> > > > > +     writel(lower_32_bits(swdma_chan->dma_addr_cq),
> > > > > &chan_fw-
> > > > > > cq_base_lo);
> > > > > +     writel(upper_32_bits(swdma_chan->dma_addr_cq),
> > > > > &chan_fw-
> > > > > > cq_base_hi);
> > > > > +
> > > > > +     writew(SWITCHTEC_DMA_SQ_SIZE, &swdma_chan-
> > > > > >mmio_chan_fw-
> > > > > > sq_size);
> > > > > +     writew(SWITCHTEC_DMA_CQ_SIZE, &swdma_chan-
> > > > > >mmio_chan_fw-
> > > > > > cq_size);
> > > >
> > > > what is write happening in the descriptor alloc callback, that
> > > > does
> > > > not
> > > > sound correct to me
> > >
> > > All the queue descriptors of a channel are pre-allocated, so I
> > > think
> > > it's proper to convey the queue address/size to hardware at this
> > > point.
> > > After this initialization, we only need to assign cookie in
> > > submit
> > > and
> > > update queue head to hardware in issue_pending.
>
> Sorry that is not right, you can prepare multiple descriptors and
> then
> submit. Only at submit is the cookie assigned which is in order, so
> this
> should be moved to when we start the txn and not in this call
>
The hardware assumes the SQ/CQ is a contiguous circular buffer of fix
sized element. And the above code passes the address and size of SQ/CQ
to the hardware. At this point hardware will do nothing but take note
of the SQ/CQ location/size. 

When do dma_issue_pending(), the actual SQ head write will occur to
update hardware with the current SQ head, as below:

writew(swdma_chan->head, &swdma_chan->mmio_chan_hw->sq_tail);


Thank you Vinod for your time!
Kelvin