Re: [PATCH v9 13/13] media: ti: Add CSI2RX support for J721E

From: Jai Luthra
Date: Fri Aug 18 2023 - 06:27:17 EST


Hi Tomi,

Thanks for the review.

On Aug 15, 2023 at 16:00:51 +0300, Tomi Valkeinen wrote:
> On 11/08/2023 13:47, Jai Luthra wrote:
> > From: Pratyush Yadav <p.yadav@xxxxxx>
> >

...

> > +
> > +static void ti_csi2rx_drain_callback(void *param)
> > +{
> > + struct completion *drain_complete = param;
> > +
> > + complete(drain_complete);
> > +}
> > +
> > +/** Drain the stale data left at the PSI-L endpoint.
> > + *
> > + * This might happen if no buffers are queued in time but source is still
> > + * streaming. Or rarely it may happen while stopping the stream. To prevent
>
> I understand the first one, but when does this happen when stopping the
> stream?
>

When multi-stream support is enabled the module-level pixel reset for
cannot be done when stopping a single stream, in which case some
in-flight data is left at the PSI-L endpoint despite enforcing the
DMACNTX reset before. The same was true till v7 of this series as well,
due to the module-level pixel reset being done in the wrong order
(before stopping stream on the source).

Not sure if this will happen for single-stream usecases now (since v8)

I will fix this and other comments when I post subsequent patches for
multi-stream.

> > + * that stale data corrupting the subsequent transactions, it is required to
> > + * issue DMA requests to drain it out.
> > + */
> > +static int ti_csi2rx_drain_dma(struct ti_csi2rx_dev *csi)
> > +{
> > + struct dma_async_tx_descriptor *desc;
> > + struct completion drain_complete;
> > + dma_cookie_t cookie;
> > + int ret;
> > +
> > + init_completion(&drain_complete);
> > +
> > + desc = dmaengine_prep_slave_single(csi->dma.chan, csi->dma.drain.paddr,
> > + csi->dma.drain.len, DMA_DEV_TO_MEM,
> > + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> > + if (!desc) {
> > + ret = -EIO;
> > + goto out;
> > + }
> > +
> > + desc->callback = ti_csi2rx_drain_callback;
> > + desc->callback_param = &drain_complete;
> > +
> > + cookie = dmaengine_submit(desc);
> > + ret = dma_submit_error(cookie);
> > + if (ret)
> > + goto out;
> > +
> > + dma_async_issue_pending(csi->dma.chan);
> > +
> > + if (!wait_for_completion_timeout(&drain_complete,
> > + msecs_to_jiffies(DRAIN_TIMEOUT_MS))) {
> > + dmaengine_terminate_sync(csi->dma.chan);
> > + ret = -ETIMEDOUT;
> > + goto out;
> > + }
> > +out:
> > + return ret;
> > +}
> > +
> > +static void ti_csi2rx_dma_callback(void *param)
> > +{
> > + struct ti_csi2rx_buffer *buf = param;
> > + struct ti_csi2rx_dev *csi = buf->csi;
> > + struct ti_csi2rx_dma *dma = &csi->dma;
> > + unsigned long flags;
> > +
> > + /*
> > + * TODO: Derive the sequence number from the CSI2RX frame number
> > + * hardware monitor registers.
> > + */
> > + buf->vb.vb2_buf.timestamp = ktime_get_ns();
> > + buf->vb.sequence = csi->sequence++;
> > +
> > + spin_lock_irqsave(&dma->lock, flags);
> > +
> > + WARN_ON(!list_is_first(&buf->list, &dma->submitted));
> > + vb2_buffer_done(&buf->vb.vb2_buf, VB2_BUF_STATE_DONE);
> > + list_del(&buf->list);
> > +
> > + /* If there are more buffers to process then start their transfer. */
> > + while (!list_empty(&dma->queue)) {
> > + buf = list_entry(dma->queue.next, struct ti_csi2rx_buffer, list);
> > +
> > + if (ti_csi2rx_start_dma(csi, buf)) {
> > + dev_err(csi->dev, "Failed to queue the next buffer for DMA\n");
> > + vb2_buffer_done(&buf->vb.vb2_buf, VB2_BUF_STATE_ERROR);
> > + } else {
> > + list_move_tail(&buf->list, &dma->submitted);
> > + }
> > + }
> > +
> > + if (list_empty(&dma->submitted))
> > + dma->state = TI_CSI2RX_DMA_IDLE;
> > +
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > +}
> > +
> > +static int ti_csi2rx_start_dma(struct ti_csi2rx_dev *csi,
> > + struct ti_csi2rx_buffer *buf)
> > +{
> > + unsigned long addr;
> > + struct dma_async_tx_descriptor *desc;
> > + size_t len = csi->v_fmt.fmt.pix.sizeimage;
> > + dma_cookie_t cookie;
> > + int ret = 0;
> > +
> > + addr = vb2_dma_contig_plane_dma_addr(&buf->vb.vb2_buf, 0);
> > + desc = dmaengine_prep_slave_single(csi->dma.chan, addr, len,
> > + DMA_DEV_TO_MEM,
> > + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> > + if (!desc)
> > + return -EIO;
> > +
> > + desc->callback = ti_csi2rx_dma_callback;
> > + desc->callback_param = buf;
> > +
> > + cookie = dmaengine_submit(desc);
> > + ret = dma_submit_error(cookie);
> > + if (ret)
> > + return ret;
> > +
> > + dma_async_issue_pending(csi->dma.chan);
> > +
> > + return 0;
> > +}
> > +
> > +static void ti_csi2rx_cleanup_buffers(struct ti_csi2rx_dev *csi,
> > + enum vb2_buffer_state buf_state)
> > +{
> > + struct ti_csi2rx_dma *dma = &csi->dma;
> > + struct ti_csi2rx_buffer *buf, *tmp;
> > + enum ti_csi2rx_dma_state state;
> > + unsigned long flags;
> > + int ret;
> > +
> > + spin_lock_irqsave(&dma->lock, flags);
> > + state = csi->dma.state;
> > + dma->state = TI_CSI2RX_DMA_STOPPED;
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > +
> > + if (state != TI_CSI2RX_DMA_STOPPED) {
> > + /*
> > + * Normal DMA termination sometimes does not clean up pending
> > + * data on the endpoint.
>
> When is "sometimes"? It's good to be more exact.
>
> > + */
> > + ret = ti_csi2rx_drain_dma(csi);
> > + if (ret)
> > + dev_dbg(csi->dev,
> > + "Failed to drain DMA. Next frame might be bogus\n");
> > + }
> > + ret = dmaengine_terminate_sync(csi->dma.chan);
> > + if (ret)
> > + dev_err(csi->dev, "Failed to stop DMA: %d\n", ret);
> > +
> > + dma_free_coherent(csi->dev, dma->drain.len,
> > + dma->drain.vaddr, dma->drain.paddr);
> > + dma->drain.vaddr = NULL;
> > +
> > + spin_lock_irqsave(&dma->lock, flags);
> > + list_for_each_entry_safe(buf, tmp, &csi->dma.queue, list) {
> > + list_del(&buf->list);
> > + vb2_buffer_done(&buf->vb.vb2_buf, buf_state);
> > + }
> > + list_for_each_entry_safe(buf, tmp, &csi->dma.submitted, list) {
> > + list_del(&buf->list);
> > + vb2_buffer_done(&buf->vb.vb2_buf, buf_state);
> > + }
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > +}
> > +
> > +static int ti_csi2rx_queue_setup(struct vb2_queue *q, unsigned int *nbuffers,
> > + unsigned int *nplanes, unsigned int sizes[],
> > + struct device *alloc_devs[])
> > +{
> > + struct ti_csi2rx_dev *csi = vb2_get_drv_priv(q);
> > + unsigned int size = csi->v_fmt.fmt.pix.sizeimage;
> > +
> > + if (*nplanes) {
> > + if (sizes[0] < size)
> > + return -EINVAL;
> > + size = sizes[0];
> > + }
> > +
> > + *nplanes = 1;
> > + sizes[0] = size;
> > +
> > + return 0;
> > +}
> > +
> > +static int ti_csi2rx_buffer_prepare(struct vb2_buffer *vb)
> > +{
> > + struct ti_csi2rx_dev *csi = vb2_get_drv_priv(vb->vb2_queue);
> > + unsigned long size = csi->v_fmt.fmt.pix.sizeimage;
> > +
> > + if (vb2_plane_size(vb, 0) < size) {
> > + dev_err(csi->dev, "Data will not fit into plane\n");
> > + return -EINVAL;
> > + }
> > +
> > + vb2_set_plane_payload(vb, 0, size);
> > + return 0;
> > +}
> > +
> > +static void ti_csi2rx_buffer_queue(struct vb2_buffer *vb)
> > +{
> > + struct ti_csi2rx_dev *csi = vb2_get_drv_priv(vb->vb2_queue);
> > + struct ti_csi2rx_buffer *buf;
> > + struct ti_csi2rx_dma *dma = &csi->dma;
> > + bool restart_dma = false;
> > + unsigned long flags = 0;
> > + int ret;
> > +
> > + buf = container_of(vb, struct ti_csi2rx_buffer, vb.vb2_buf);
> > + buf->csi = csi;
> > +
> > + spin_lock_irqsave(&dma->lock, flags);
> > + /*
> > + * Usually the DMA callback takes care of queueing the pending buffers.
> > + * But if DMA has stalled due to lack of buffers, restart it now.
> > + */
> > + if (dma->state == TI_CSI2RX_DMA_IDLE) {
> > + /*
> > + * Do not restart DMA with the lock held because
> > + * ti_csi2rx_drain_dma() might block for completion.
> > + * There won't be a race on queueing DMA anyway since the
> > + * callback is not being fired.
> > + */
> > + restart_dma = true;
> > + dma->state = TI_CSI2RX_DMA_ACTIVE;
> > + } else {
> > + list_add_tail(&buf->list, &dma->queue);
> > + }
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > +
> > + if (restart_dma) {
> > + /*
> > + * Once frames start dropping, some data gets stuck in the DMA
> > + * pipeline somewhere. So the first DMA transfer after frame
> > + * drops gives a partial frame. This is obviously not useful to
> > + * the application and will only confuse it. Issue a DMA
> > + * transaction to drain that up.
> > + */
> > + ret = ti_csi2rx_drain_dma(csi);
> > + if (ret)
> > + dev_warn(csi->dev,
> > + "Failed to drain DMA. Next frame might be bogus\n");
> > +
> > + ret = ti_csi2rx_start_dma(csi, buf);
> > + if (ret) {
> > + dev_err(csi->dev, "Failed to start DMA: %d\n", ret);
> > + spin_lock_irqsave(&dma->lock, flags);
> > + vb2_buffer_done(&buf->vb.vb2_buf, VB2_BUF_STATE_ERROR);
> > + dma->state = TI_CSI2RX_DMA_IDLE;
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > + } else {
> > + spin_lock_irqsave(&dma->lock, flags);
> > + list_add_tail(&buf->list, &dma->submitted);
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > + }
> > + }
> > +}
> > +
> > +static int ti_csi2rx_start_streaming(struct vb2_queue *vq, unsigned int count)
> > +{
> > + struct ti_csi2rx_dev *csi = vb2_get_drv_priv(vq);
> > + struct ti_csi2rx_dma *dma = &csi->dma;
> > + struct ti_csi2rx_buffer *buf;
> > + unsigned long flags;
> > + int ret = 0;
> > +
> > + spin_lock_irqsave(&dma->lock, flags);
> > + if (list_empty(&dma->queue))
> > + ret = -EIO;
> > + spin_unlock_irqrestore(&dma->lock, flags);
> > + if (ret)
> > + return ret;
> > +
> > + dma->drain.len = csi->v_fmt.fmt.pix.sizeimage;
> > + dma->drain.vaddr = dma_alloc_coherent(csi->dev, dma->drain.len,
> > + &dma->drain.paddr, GFP_KERNEL);
> > + if (!dma->drain.vaddr)
> > + return -ENOMEM;
>
> This is still allocating a large buffer every time streaming is started (and
> with streams support, a separate buffer for each stream?).
>
> Did you check if the TI DMA can do writes to a constant address? That would
> be the best option, as then the whole buffer allocation problem goes away.
>

I checked with Vignesh, the hardware can support a scenario where we
flush out all the data without allocating a buffer, but I couldn't find
a way to signal that via the current dmaengine framework APIs. Will look
into it further as it will be important for multi-stream support.

> Alternatively, can you flush the buffers with multiple one line transfers?
> The flushing shouldn't be performance critical, so even if that's slower
> than a normal full-frame DMA, it shouldn't matter much. And if that can be
> done, a single probe time line-buffer allocation should do the trick.

There will be considerable overhead if we queue many DMA transactions
(in the order of 1000s or even 100s), which might not be okay for the
scenarios where we have to drain mid-stream. Will have to run some
experiments to see if that is worth it.

But one optimization we can for sure do is re-use a single drain buffer
for all the streams. We will need to ensure to re-allocate the buffer
for the "largest" framesize supported across the different streams at
stream-on time.

My guess is the endpoint is not buffering a full-frame's worth of data,
I will also check if we can upper bound that size to something feasible.

>
> Other than this drain buffer topic, I think this looks fine. So, I'm going
> to give Rb, but I do encourage you to look more into optimizing this drain
> buffer.

Thank you!

>
> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx>
>
> Tomi
>

--
Thanks,
Jai

GPG Fingerprint: 4DE0 D818 E5D5 75E8 D45A AFC5 43DE 91F9 249A 7145

Attachment: signature.asc
Description: PGP signature