Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

From: Nicolas Dufresne
Date: Fri Jul 28 2023 - 12:09:29 EST


Le vendredi 28 juillet 2023 à 13:43 +0900, Tomasz Figa a écrit :
> On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >
> > Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> > > On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> > > >
> > > > Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> > > > > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > > > > > From: "Hsia-Jun(Randy) Li" <randy.li@xxxxxxxxxxxxx>
> > > > > >
> > > > > > Many drivers have to create its own buf_struct for a
> > > > > > vb2_queue to track such a state. Also driver has to
> > > > > > iterate over rdy_queue every times to find out a buffer
> > > > > > which is not sent to hardware(or firmware), this new
> > > > > > list just offers the driver a place to store the buffer
> > > > > > that hardware(firmware) has acknowledged.
> > > > > >
> > > > > > One important advance about this list, it doesn't like
> > > > > > rdy_queue which both bottom half of the user calling
> > > > > > could operate it, while the v4l2 worker would as well.
> > > > > > The v4l2 core could only operate this queue when its
> > > > > > v4l2_context is not running, the driver would only
> > > > > > access this new hw_queue in its own worker.
> > > > >
> > > > > Could you describe in what case such a list would be useful for a
> > > > > mem2mem driver?
> > > >
> > > > Today all driver must track buffers that are "owned by the hardware". This is a
> > > > concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> > > > buffers from this list must be mark as done/error/queued after streamoff of the
> > > > respective queue in order to acknowledge that they are no longer in use by the
> > > > HW. Not doing so will warn:
> > > >
> > > > videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> > > >
> > > > Though, there is no queue to easily iterate them. All driver endup having their
> > > > own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> > > >
> > >
> > > Thanks for the explanation. I see how it could be useful now.
> > >
> > > Although I guess this is a problem specifically for hardware (or
> > > firmware) which can internally queue more than 1 buffer, right?
> > > Otherwise the current buffer could just stay at the top of the
> > > rdy_queue until it's removed by the driver's completion handler,
> > > timeout/error handler or context destruction.
> >
> > Correct, its only an issue when you need to process multiple src buffers before
> > producing a dst buffer. If affects stateful decoder, stateful encoders and
> > deinterlacer as far as I'm aware.
>
> Is it actually necessary to keep those buffers in a list in that case, though?
> I can see that a deinterlacer would indeed need 2 input buffers to
> perform the deinterlacing operation, but those would be just known to
> the driver, since it's running the task currently.
> For a stateful decoder, wouldn't it just consume the bitstream buffer
> (producing something partially decoded to its own internal buffers)
> and return it shortly?

In practice, in stateful decoder, we pace the consumption of input buffers,
otherwise we just endup consuming the entire video into a ring buffer, which
makes operation like seeks quite heavy and cause CPU spikes.

That being said, I'm not sure how useful a list would be for bitstream buffers.
At the moment, in my current work, I'm leaving buffers in the ready queue, and
just tagging the one I have already copied into the ring buffer. And I remove
them from the ready list, when the related data has been decoded. This is when I
actually copy the timestamp from src to dst buffer. So in short, I don't use an
extra list, but use some marking on the buffers though, to remember which one
have already been copied. This is specific to ring buffer based codecs of
course.

The one where a second list helps is for display picture buffers. When a buffer
has been filled, if its in the ready queue, I currently remove that buffer and
put it in a custom list. It will then be removed when/if the firmware decides to
display it. It may also never be displayed, and reused by the firmware. I short,
these are the frame "owned" by the firmware and containing valid pixels. The rdy
list contains free pictures buffers, and the pixels are undefined.

Maybe, and I'm ready to try, I could also leave them in ready queue and opt for
marking and a counter. As I'm using a job_ready() function, its my driver that
decides if a device_run() should be executed or not. So what matters is
basically if there is a free buffer for a new decode operation, and a counter of
filled but not displayed buffer could probably do that.

> The most realistic scenario would be for stateful encoders which could
> keep some input buffers as reference frames for further encoding, but
> then would this patch actually work for them? It would make
> __v4l2_m2m_try_queue never add the context to the job_queue if there
> are some buffers in that hw_queue list.

Encoders have 3 set of buffers, despite m2m having two queues. OUTPUT buffers
are the pictures, there is a set of internal reconstruction buffers, and finally
the CAPTURE buffers are the bitstream. Bitstream buffers are subject to
reordering, so conceptually the firmware holds more then 1, and reconstruction
buffers are completely hidden.

>
> Maybe what I need here are actual patches modifying some existing
> drivers. Randy, would you be able to include that in the next version?
> Thanks.

Agreed.

>
> Best regards,
> Tomasz
>
> >
> > Nicolas
> >
> > >
> > > Best regards,
> > > Tomasz
> > >
> > > > Nicolas
> > > > >
> > > > > >
> > > > > > Signed-off-by: Hsia-Jun(Randy) Li <randy.li@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > > > > > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > > > > > 2 files changed, 26 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > index c771aba42015..b4151147d5bd 100644
> > > > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > > > goto job_unlock;
> > > > > > }
> > > > > >
> > > > > > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > > - dprintk("No input buffers available\n");
> > > > > > - goto job_unlock;
> > > > > > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > > > > > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > > +
> > > > > > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > > + dprintk("No input buffers available\n");
> > > > > > + goto job_unlock;
> > > > > > + }
> > > > > > }
> > > > > > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > > - dprintk("No output buffers available\n");
> > > > > > - goto job_unlock;
> > > > > > +
> > > > > > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > > > > > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > > + dprintk("No output buffers available\n");
> > > > > > + goto job_unlock;
> > > > > > + }
> > > > > > }
> > > > >
> > > > > src and dst would be referenced unitialized below if neither of the
> > > > > above ifs hits...
> > > > >
> > > > > Best regards,
> > > > > Tomasz
> > > > >
> > > > > >
> > > > > > m2m_ctx->new_frame = true;
> > > > > > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > > > > > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > > > > > q_ctx->num_rdy = 0;
> > > > > > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > > > > > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> > > > > >
> > > > > > if (m2m_dev->curr_ctx == m2m_ctx) {
> > > > > > m2m_dev->curr_ctx = NULL;
> > > > > > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> > > > > >
> > > > > > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > > > > > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > > > > > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > > > > > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > > > > > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > > > > > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> > > > > >
> > > > > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > > > > index d6c8eb2b5201..2342656e582d 100644
> > > > > > --- a/include/media/v4l2-mem2mem.h
> > > > > > +++ b/include/media/v4l2-mem2mem.h
> > > > > > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > > > > > * processed
> > > > > > *
> > > > > > * @q: pointer to struct &vb2_queue
> > > > > > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > > > > > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > > > > > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > > > > > + * by user would be added to this list.
> > > > > > * @rdy_spinlock: spin lock to protect the struct usage
> > > > > > * @num_rdy: number of buffers ready to be processed
> > > > > > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > > > > > + * (or device's firmware). A buffer could only be in either
> > > > > > + * this list or @rdy_queue.
> > > > > > + * Driver may choose not to use this list while uses its own
> > > > > > + * private data to do this work.
> > > > > > * @buffered: is the queue buffered?
> > > > > > *
> > > > > > * Queue for buffers ready to be processed as soon as this
> > > > > > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > > > > > struct list_head rdy_queue;
> > > > > > spinlock_t rdy_spinlock;
> > > > > > u8 num_rdy;
> > > > > > + struct list_head hw_queue;
> > > > > > bool buffered;
> > > > > > };
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > > >
> > > >
> >