Re: [PATCH v13 5/8] media: chips-media: wave5: Add the v4l2 layer

From: Ivan Bornyakov
Date: Fri Nov 03 2023 - 06:43:06 EST


Hi, Deborah

On Thu, Nov 02, 2023 at 10:07:59AM -0700, Deborah Brouwer wrote:
> On Wed, Oct 18, 2023 at 01:13:52AM +0300, Ivan Bornyakov wrote:
> > Hi!
>
> Hi Ivan,
>
> >
> > On Thu, 12 Oct 2023 13:01:03 +0200, Sebastian Fricke wrote:
> > > Add the decoder and encoder implementing the v4l2
> > > API. This patch also adds the Makefile and the VIDEO_WAVE_VPU config
> > >
> > > Signed-off-by: Sebastian Fricke <sebastian.fricke@xxxxxxxxxxxxx>
> > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx>
> > > Signed-off-by: Deborah Brouwer <deborah.brouwer@xxxxxxxxxxxxx>
> > > Signed-off-by: Robert Beckett <bob.beckett@xxxxxxxxxxxxx>
> > > Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@xxxxxxxxxxxxx>
> > > Signed-off-by: Nas Chung <nas.chung@xxxxxxxxxxxxxxx>
> > > ---
> > > drivers/media/platform/chips-media/Kconfig | 1 +
> > > drivers/media/platform/chips-media/Makefile | 1 +
> > > drivers/media/platform/chips-media/wave5/Kconfig | 12 +
> > > drivers/media/platform/chips-media/wave5/Makefile | 10 +
> > > .../platform/chips-media/wave5/wave5-helper.c | 213 +++
> > > .../platform/chips-media/wave5/wave5-helper.h | 31 +
> > > .../platform/chips-media/wave5/wave5-vpu-dec.c | 1953 ++++++++++++++++++++
> > > .../platform/chips-media/wave5/wave5-vpu-enc.c | 1794 ++++++++++++++++++
> > > .../media/platform/chips-media/wave5/wave5-vpu.c | 291 +++
> > > .../media/platform/chips-media/wave5/wave5-vpu.h | 83 +
> > > .../platform/chips-media/wave5/wave5-vpuapi.h | 2 -
> > > 11 files changed, 4389 insertions(+), 2 deletions(-)
> >
> > [...]
> >
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > new file mode 100644
> > > index 000000000000..74d1fae64fa4
> > > --- /dev/null
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> >
> > [...]
> >
> > > +static int wave5_vpu_dec_queue_setup(struct vb2_queue *q, unsigned int *num_buffers,
> > > + unsigned int *num_planes, unsigned int sizes[],
> > > + struct device *alloc_devs[])
> > > +{
> > > + struct vpu_instance *inst = vb2_get_drv_priv(q);
> > > + struct v4l2_pix_format_mplane inst_format =
> > > + (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) ? inst->src_fmt : inst->dst_fmt;
> > > + unsigned int i;
> > > +
> > > + dev_dbg(inst->dev->dev, "%s: num_buffers: %u | num_planes: %u | type: %u\n", __func__,
> > > + *num_buffers, *num_planes, q->type);
> > > +
> > > + /* the CREATE_BUFS case */
> > > + if (*num_planes) {
> > > + if (inst_format.num_planes != *num_planes)
> > > + return -EINVAL;
> > > +
> > > + for (i = 0; i < *num_planes; i++) {
> > > + if (sizes[i] < inst_format.plane_fmt[i].sizeimage)
> > > + return -EINVAL;
> > > + }
> > > + /* the REQBUFS case */
> > > + } else {
> > > + *num_planes = inst_format.num_planes;
> > > +
> > > + if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > > + sizes[0] = inst_format.plane_fmt[0].sizeimage;
> > > + dev_dbg(inst->dev->dev, "%s: size[0]: %u\n", __func__, sizes[0]);
> > > + } else if (*num_planes == 1) {
> >
> > I think, you should also set *num_buffers to be inst->fbc_buf_count for
> > V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, like this:
> >
> > } else if (q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> > if (*num_buffers < inst->fbc_buf_count)
> > *num_buffers = inst->fbc_buf_count;
> >
> > switch (*num_planes) {
> > case 1:
> > ...
> > case 2:
> > ...
> > case 3:
> > ...
> > }
> > }
>
> I was able to reproduce this issue by requesting a small number of buffers
> on the CAPTURE queue that was less than inst→fbc_buf_count. When this happens,
> wave5_vpu_dec_job_ready() fails here:
> (v4l2_m2m_num_dst_bufs_ready(m2m_ctx) < (inst->fbc_buf_count - 1)
>
> I also tested your suggestion to set the *num_buffers to inst→fbc_buf_count
> in queue_setup() and it seems to be working well, thanks for this.
>
> I've been working on ways to improve testing for stateful decoders so
> I'm curious how you found this issue?
>
> With fluster tests that we use, gstreamer seems to be calculating correct number of
> CAPTURE buffers to request, so we wouldn't see this.
>

I use v4l2-ctl and ffmpeg for testing the decoder.

The first time I've encountered the discussed issue was when v4l2-ctl was
able to decode one file but not the other with bigger resolution.

For v4l2-ctl command line will look like this:
v4l2-ctl --stream-from src-file.h265 --stream-to out-file.yuv --stream-mmap --stream-out-mmap

For ffmpeg command line will look like this:
ffmpeg -c:v hevc_v4l2m2m -i src-file.h265 -fps_mode passthrough out-file.yuv

Also for ffmpeg there are options -num_output_buffers and -num_capture_buffers
to set initial numbers of buffers in OUTPUT and CAPTURE queues, by
default they are set to 20 or so.

> >
> > The reason for that is if fbc_buf_count is greater than initial num_buffers,
> > wave5_vpu_dec_job_ready() wouldn't allow to invoke wave5_vpu_dec_device_run()
> >
> > Here is a part of the log of described situation:
> >
> > vdec 20410000.wave515: Switch state from NONE to OPEN.
> > [...]
> > vdec 20410000.wave515: wave5_vpu_dec_init_seq: init seq sent (queue 1 : 1)
> > vdec 20410000.wave515: wave5_vpu_dec_get_seq_info: init seq complete (queue 0 : 0)
> > [...]
> > vdec 20410000.wave515: handle_dynamic_resolution_change: width: 4112 height: 3008 profile: 1 | minbuffer: 6
> > ^^^^^^^^ note that minbuffer is 6
> >
> > vdec 20410000.wave515: Switch state from OPEN to INIT_SEQ.
> > [...]
> > vdec 20410000.wave515: decoder command: 1
> > [...]
> > vdec 20410000.wave515: wave5_vpu_dec_queue_setup: num_buffers: 4 | num_planes: 0 | type: 9
> > ^^^^^^^^ note that num_buffers is 4
> >
> > vdec 20410000.wave515: wave5_vpu_dec_queue_setup: size[0]: 18625536
> > vdec 20410000.wave515: CAPTURE queue must be streaming to queue jobs!
> > vdec 20410000.wave515: CAPTURE queue must be streaming to queue jobs!
> > vdec 20410000.wave515: CAPTURE queue must be streaming to queue jobs!
> > vdec 20410000.wave515: CAPTURE queue must be streaming to queue jobs!
> > vdec 20410000.wave515: wave5_vpu_dec_buf_queue: type: 9 index: 0 size: ([0]=18625536, [1]= 0, [2]= 0)
> > vdec 20410000.wave515: wave5_vpu_dec_buf_queue: type: 9 index: 1 size: ([0]=18625536, [1]= 0, [2]= 0)
> > vdec 20410000.wave515: wave5_vpu_dec_buf_queue: type: 9 index: 2 size: ([0]=18625536, [1]= 0, [2]= 0)
> > vdec 20410000.wave515: wave5_vpu_dec_buf_queue: type: 9 index: 3 size: ([0]=18625536, [1]= 0, [2]= 0)
> > vdec 20410000.wave515: wave5_vpu_dec_start_streaming: type: 9
> > vdec 20410000.wave515: No capture buffer ready to decode!
> > ^^^^^^^^ here v4l2_m2m_num_dst_bufs_ready(m2m_ctx) < (inst->fbc_buf_count - 1), namely 4 < 6
> >
> > vdec 20410000.wave515: wave5_vpu_dec_stop_streaming: type: 9
> > vdec 20410000.wave515: streamoff_capture: Setting display flag of buf index: 0, fail: -22
> > vdec 20410000.wave515: streamoff_capture: Setting display flag of buf index: 1, fail: -22
> > vdec 20410000.wave515: streamoff_capture: Setting display flag of buf index: 2, fail: -22
> > vdec 20410000.wave515: streamoff_capture: Setting display flag of buf index: 3, fail: -22
> > [...]
> > vdec 20410000.wave515: wave5_vpu_dec_close: dec_finish_seq complete
> >
> > Altering num_buffers solves the issue for me.
> >
> > > + if (inst->output_format == FORMAT_422)
> > > + sizes[0] = inst_format.width * inst_format.height * 2;
> > > + else
> > > + sizes[0] = inst_format.width * inst_format.height * 3 / 2;
> > > + dev_dbg(inst->dev->dev, "%s: size[0]: %u\n", __func__, sizes[0]);
> > > + } else if (*num_planes == 2) {
> > > + sizes[0] = inst_format.width * inst_format.height;
> > > + if (inst->output_format == FORMAT_422)
> > > + sizes[1] = inst_format.width * inst_format.height;
> > > + else
> > > + sizes[1] = inst_format.width * inst_format.height / 2;
> > > + dev_dbg(inst->dev->dev, "%s: size[0]: %u | size[1]: %u\n",
> > > + __func__, sizes[0], sizes[1]);
> > > + } else if (*num_planes == 3) {
> > > + sizes[0] = inst_format.width * inst_format.height;
> > > + if (inst->output_format == FORMAT_422) {
> > > + sizes[1] = inst_format.width * inst_format.height / 2;
> > > + sizes[2] = inst_format.width * inst_format.height / 2;
> > > + } else {
> > > + sizes[1] = inst_format.width * inst_format.height / 4;
> > > + sizes[2] = inst_format.width * inst_format.height / 4;
> > > + }
> > > + dev_dbg(inst->dev->dev, "%s: size[0]: %u | size[1]: %u | size[2]: %u\n",
> > > + __func__, sizes[0], sizes[1], sizes[2]);
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > +}
> >
> > BTW I'm currently tinkering with your patchset on another C&M IP and would be
> > gratefull if you Cc me in the future versions of the patchset, if any.
>
> Yes, sorry for missing you on v13, thanks for taking the time to review.
>
> Deborah
>
> >
> > Thanks.