Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

From: Tomasz Figa
Date: Fri Jul 28 2023 - 03:27:08 EST


On Fri, Jul 28, 2023 at 4:09 PM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote:
>
>
>
> On 7/28/23 12:43, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >>
> >> Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> >>> On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> >>>>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> >>>>>> From: "Hsia-Jun(Randy) Li" <randy.li@xxxxxxxxxxxxx>
> >>>>>>
> >>>>>> Many drivers have to create its own buf_struct for a
> >>>>>> vb2_queue to track such a state. Also driver has to
> >>>>>> iterate over rdy_queue every times to find out a buffer
> >>>>>> which is not sent to hardware(or firmware), this new
> >>>>>> list just offers the driver a place to store the buffer
> >>>>>> that hardware(firmware) has acknowledged.
> >>>>>>
> >>>>>> One important advance about this list, it doesn't like
> >>>>>> rdy_queue which both bottom half of the user calling
> >>>>>> could operate it, while the v4l2 worker would as well.
> >>>>>> The v4l2 core could only operate this queue when its
> >>>>>> v4l2_context is not running, the driver would only
> >>>>>> access this new hw_queue in its own worker.
> >>>>>
> >>>>> Could you describe in what case such a list would be useful for a
> >>>>> mem2mem driver?
> >>>>
> >>>> Today all driver must track buffers that are "owned by the hardware". This is a
> >>>> concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> >>>> buffers from this list must be mark as done/error/queued after streamoff of the
> >>>> respective queue in order to acknowledge that they are no longer in use by the
> >>>> HW. Not doing so will warn:
> >>>>
> >>>> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> >>>>
> >>>> Though, there is no queue to easily iterate them. All driver endup having their
> >>>> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> >>>>
> >>>
> >>> Thanks for the explanation. I see how it could be useful now.
> >>>
> >>> Although I guess this is a problem specifically for hardware (or
> >>> firmware) which can internally queue more than 1 buffer, right?
> >>> Otherwise the current buffer could just stay at the top of the
> >>> rdy_queue until it's removed by the driver's completion handler,
> >>> timeout/error handler or context destruction.
> >>
> >> Correct, its only an issue when you need to process multiple src buffers before
> >> producing a dst buffer. If affects stateful decoder, stateful encoders and
> >> deinterlacer as far as I'm aware.
> >
> > Is it actually necessary to keep those buffers in a list in that case, though?
> > I can see that a deinterlacer would indeed need 2 input buffers to
> > perform the deinterlacing operation, but those would be just known to
> > the driver, since it's running the task currently.
> > For a stateful decoder, wouldn't it just consume the bitstream buffer
> > (producing something partially decoded to its own internal buffers)
> > and return it shortly?
> Display re-order. Firmware could do such batch work, taking a few
> bitstream buffer, then output a list graphics buffer in the display
> order also discard the usage of the non-display buffer when it is
> removed from dpb.
>
> Even in one input and one output mode, firmware need to do redo, let the
> driver know when a graphics buffer could be display, so firmware would
> usually hold the graphics buffer(frame) until its display time.
>

Okay, so that hold would be for frame buffers, not bitstream buffers, right?
But yeah, I see that then it could hold onto those buffers until it's
their turn to display and it could be a bigger number of frames,
depending on the complexity of the codec.

> Besides, I hate the driver occupied a large of memory without user's
> order. I would like to drop those internal buffers.

I think this is one reason to migrate to the stateless decoder design.

> > The most realistic scenario would be for stateful encoders which could
> > keep some input buffers as reference frames for further encoding, but
> > then would this patch actually work for them? It would make
> > __v4l2_m2m_try_queue never add the context to the job_queue if there
> > are some buffers in that hw_queue list.
> why?
> >
> > Maybe what I need here are actual patches modifying some existing
> > drivers. Randy, would you be able to include that in the next version?
> May not. The Synaptics VideoSmart is a secure video platform(DRM), I
> could release a snapshot of the driver when I got the permission, that
> would be after the official release of the SDK.
> But you may not be able to compile it because we have our own TEE
> interface(not optee), also running it because the trusted app would be
> signed with a per-device key.

Could you modify another, already existing driver then?

> > Thanks.
> >
> > Best regards,
> > Tomasz
> >
> >>
> >> Nicolas
> >>
> >>>
> >>> Best regards,
> >>> Tomasz
> >>>
> >>>> Nicolas
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Hsia-Jun(Randy) Li <randy.li@xxxxxxxxxxxxx>
> >>>>>> ---
> >>>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> >>>>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
> >>>>>> 2 files changed, 26 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> index c771aba42015..b4151147d5bd 100644
> >>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >>>>>> goto job_unlock;
> >>>>>> }
> >>>>>>
> >>>>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>>>> - dprintk("No input buffers available\n");
> >>>>>> - goto job_unlock;
> >>>>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> >>>>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>>>> +
> >>>>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>>>> + dprintk("No input buffers available\n");
> >>>>>> + goto job_unlock;
> >>>>>> + }
> >>>>>> }
> >>>>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>>>> - dprintk("No output buffers available\n");
> >>>>>> - goto job_unlock;
> >>>>>> +
> >>>>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> >>>>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>>>> + dprintk("No output buffers available\n");
> >>>>>> + goto job_unlock;
> >>>>>> + }
> >>>>>> }
> >>>>>
> >>>>> src and dst would be referenced unitialized below if neither of the
> >>>>> above ifs hits...
> >>>>>
> >>>>> Best regards,
> >>>>> Tomasz
> >>>>>
> >>>>>>
> >>>>>> m2m_ctx->new_frame = true;
> >>>>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >>>>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> >>>>>> q_ctx->num_rdy = 0;
> >>>>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> >>>>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
> >>>>>>
> >>>>>> if (m2m_dev->curr_ctx == m2m_ctx) {
> >>>>>> m2m_dev->curr_ctx = NULL;
> >>>>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> >>>>>>
> >>>>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> >>>>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> >>>>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> >>>>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> >>>>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
> >>>>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
> >>>>>>
> >>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >>>>>> index d6c8eb2b5201..2342656e582d 100644
> >>>>>> --- a/include/media/v4l2-mem2mem.h
> >>>>>> +++ b/include/media/v4l2-mem2mem.h
> >>>>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> >>>>>> * processed
> >>>>>> *
> >>>>>> * @q: pointer to struct &vb2_queue
> >>>>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
> >>>>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> >>>>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> >>>>>> + * by user would be added to this list.
> >>>>>> * @rdy_spinlock: spin lock to protect the struct usage
> >>>>>> * @num_rdy: number of buffers ready to be processed
> >>>>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> >>>>>> + * (or device's firmware). A buffer could only be in either
> >>>>>> + * this list or @rdy_queue.
> >>>>>> + * Driver may choose not to use this list while uses its own
> >>>>>> + * private data to do this work.
> >>>>>> * @buffered: is the queue buffered?
> >>>>>> *
> >>>>>> * Queue for buffers ready to be processed as soon as this
> >>>>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> >>>>>> struct list_head rdy_queue;
> >>>>>> spinlock_t rdy_spinlock;
> >>>>>> u8 num_rdy;
> >>>>>> + struct list_head hw_queue;
> >>>>>> bool buffered;
> >>>>>> };
> >>>>>>
> >>>>>> --
> >>>>>> 2.17.1
> >>>>>>
> >>>>
> >>
>
> --
> Hsia-Jun(Randy) Li