Re: [PATCH v12 1/7] media: v4l2: Add ignore_streaming flag

From: Hans Verkuil
Date: Mon Sep 25 2023 - 05:03:20 EST


On 22/09/2023 22:20, Nicolas Dufresne wrote:
> Le vendredi 22 septembre 2023 à 10:28 +0200, Hans Verkuil a écrit :
>> On 21/09/2023 20:39, Nicolas Dufresne wrote:
>>> Le mercredi 20 septembre 2023 à 16:49 +0200, Hans Verkuil a écrit :
>>>> On 20/09/2023 16:08, Nicolas Dufresne wrote:
>>>>> cc Tomasz Figa
>>>>>
>>>>> Le mercredi 20 septembre 2023 à 14:59 +0200, Hans Verkuil a écrit :
>>>>>> On 15/09/2023 23:11, Sebastian Fricke wrote:
>>>>>>> Add a new flag to the `struct v4l2_m2m_dev` to toggle whether a queue
>>>>>>> must be streaming in order to allow queuing jobs to the ready queue.
>>>>>>> Currently, both queues (CAPTURE & OUTPUT) must be streaming in order to
>>>>>>> allow adding new jobs. This behavior limits the usability of M2M for
>>>>>>> some drivers, as these have to be able, to perform analysis of the
>>>>>>
>>>>>> able, to -> able to
>>>>>>
>>>>>>> sequence to ensure, that userspace prepares the CAPTURE queue correctly.
>>>>>>
>>>>>> ensure, that -> ensure that
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Sebastian Fricke <sebastian.fricke@xxxxxxxxxxxxx>
>>>>>>> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>> include/media/v4l2-mem2mem.h | 17 +++++++++++++++++
>>>>>>> 1 file changed, 17 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>>>>>> index d6c8eb2b5201..97a48e61e358 100644
>>>>>>> --- a/include/media/v4l2-mem2mem.h
>>>>>>> +++ b/include/media/v4l2-mem2mem.h
>>>>>>> @@ -57,6 +57,16 @@ struct v4l2_m2m_dev;
>>>>>>> * @rdy_spinlock: spin lock to protect the struct usage
>>>>>>> * @num_rdy: number of buffers ready to be processed
>>>>>>> * @buffered: is the queue buffered?
>>>>>>> + * @ignore_streaming: Dictates whether the queue must be streaming for a job to
>>>>>>> + * be queued.
>>>>>>> + * This is useful, for example, when the driver requires to
>>>>>>> + * initialize the sequence with a firmware, where only a
>>>>>>> + * queued OUTPUT queue buffer and STREAMON on the OUTPUT
>>>>>>> + * queue is required to perform the anlysis of the bitstream
>>>>>>> + * header.
>>>>>>> + * This means the driver is responsible for implementing the
>>>>>>> + * job_ready callback correctly to make sure that requirements
>>>>>>> + * for actual decoding are met.
>>>>>>
>>>>>> This is a bad description and field name.
>>>>>
>>>>> I wonder what's your opinion about the buffered one then :-D
>>>>
>>>> Even worse :-)
>>>>
>>>> I still don't really understand what that does. Patches welcome.
>>>>
>>>>>
>>>>>>
>>>>>> Basically what this field does is that, if true, the streaming state of the
>>>>>> capture queue is ignored. So just call it that: ignore_cap_streaming.
>>>>>>
>>>>>> And explain that, if true, job_ready() will be called even if the capture
>>>>>> queue is not streaming, and that that can be used to allow hardware to
>>>>>> analyze the bitstream header that arrives on the OUTPUT queue.
>>>>>
>>>>> Ack.
>>>>>
>>>>>>
>>>>>> Also, doesn't this field belong to struct v4l2_m2m_ctx? It makes no sense
>>>>>> for the output queue, this is really a configuration for the m2m context as
>>>>>> a whole.
>>>>>
>>>>> Unless we come up with a completely new type of M2M that can behave like a gap
>>>>> filler (like a video rate m2m), it indeed makes no sense for output. I'm just
>>>>> illustrating that this is true "now" but someone can come up with valid
>>>>> expectation. So I agree with you, we can move it up in the hierarchy.
>>>>>
>>>>> Recently over IRC and other threads, Tomasz raised a concern that CODECs where
>>>>> introducing too much complexity into M2M. And I believe buffered (which is
>>>>> barely documented) and this mechanism was being pointed.
>>>>>
>>>>> My take on that is that adding boolean configuration is what introduce
>>>>> complexity, and we can fix it by doing less in the m2m. After this discussion, I
>>>>> came with the idea that we should remove buffered and ignore_streaming. For
>>>>> drivers that don't implement job_ready, this logic would be moved inside the
>>>>> default implementation. We can then add a helper to check the common conditions.
>>>>>
>>>>> The alternative suggested by Tomasz, was to layer two ops. We'd have a
>>>>> device_ready() ops and its default implementation would include the check we
>>>>> have and would call job_ready(). Personally, I'd rather remove then add, but I
>>>>> understadt the reasoning and would be fine committing to that instead.
>>>>>
>>>>> I'd like your feedback on this proposal. If this is something we want, I'll do
>>>>> this prior to V13, otherwise we will address your comments and fix the added
>>>>> mechanism. I think though that we agree that for decoders, this is nice addition
>>>>> to not have to trigger work manually from vb2 ops.
>>>>
>>>> It comes down to a matter of taste, I guess. I personally think that using bools
>>>> to tweak the behavior of a framework does not necessarily increase complexity,
>>>> provided it is clearly documented what it does and why it is needed.
>>>>
>>>> I think an ignore_cap_streaming bool is pretty straightforward and has minimal
>>>> impact in the code. As long as there are good comments.
>>>
>>> So for wave5 we will opt for this and apply your suggested changes. And I may
>>> come back later on the subject.
>>>
>>>>
>>>> The 'buffered' flag is were this clearly failed completely, since I couldn't figure
>>>> out what it is supposed to do. But that is not because it makes the code more
>>>> complex, it is just because of shoddy documentation and naming.
>>>>
>>>> Quite often implementing tweaks like that are quite easy in a framework, since
>>>> you have all the information readily available. In a driver it can quickly become
>>>> messy.
>>>
>>> In this case, "buffered" is used to disable the checks for having at least one
>>> buffer in the ready queues. In most cases, if you don't have at least 1 pending
>>> capture and 1 pending output buffer, there is no point in calling device_run.
>>
>> So it is really similar to ignore_cap_streaming: that relaxes the streaming test,
>> and 'buffered' relaxes the 'must have at least one capture and output buffer ready'
>> test.
>>
>> So this should be renamed to: allow_empty_queues
>>
>> Although I would prefer to split this into two bools: allow_empty_capture_queue and
>> allow_empty_output_queue. It is more flexible that way and I actually think it is
>> easier to understand.
>
> Its on the queue ctx, so it does not have to be typed. It would have to be typed
> if moved to m2m ctx.

Oops, I missed that. I'm not actually sure that's where it should be. If you
support flags that tweak the behavior, then put them together in a single place,
and not all over.

Regards,

Hans

>
>>
>> I see also see in the v4l2-mem2mem.c source that the debug messages are very poorly
>> worded:
>> src = v4l2_m2m_next_src_buf(m2m_ctx);
>>
>> if (!src && !m2m_ctx->out_q_ctx.buffered) {
>> dprintk("No input buffers available\n");
>> goto job_unlock;
>> }
>>
>> This should be either "source buffers" or "output buffers", but definitely not
>> "input buffers".
>>
>> Ditto for the dst part.
>
> Indeed, I'll store this node somewhere for future work on the framework, this is
> not strictly related to wave5 anymore.
>
>>
>>>
>>> In reality, drivers will add use case specific checks in their job_ready()
>>> implementation. For decoders, the cases I can think of are:
>>>
>>> - On capture if you haven't parsed the stream header
>>> - On capture if the driver removes them from ready queue as a way to track which
>>> one are considered free and may be used at any time by the firmware
>>> - On output queue, if you need device_run() to be called to complete the drain
>>> the reorder queue
>>>
>>> Yet, you want this check after stream headers are parsed, or whenever a new
>>> bitstream decode operation is to be queued in the firmware. So this check gets
>>> re-implemented, but dynamically, in all decoders.
>>>
>>> Deinterlacers may needs this too with some algorithms (the one that introduce
>>> delays at least). Its not clear to me why it was called buffered,
>>> ignore_rdy_queue might have been an option, though I'm not fully confident. Note
>>> that M2M can be confusing, since whenever you ask for last something, its always
>>> relative to the ready queue, and may not make a lot of sense in the context it
>>> is used.
>>>
>>>>
>>>> For codec support there are a number of issues that increase complexity:
>>>> implementing support for the LAST flag and events, and supporting buffers
>>>> that can be held. Especially since driver implementations tend to vary.
>>>>
>>>> I've been experimenting with some cleanups and changes in v4l2-mem2mem.c
>>>> (https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=enc-dec-cmd), mainly
>>>> surrounding the handling of the LAST flag. Note: this is failing the compliance
>>>> tests, I haven't had the time to pursue this further.
>>>>
>>>> I'm not sure whether the best approach is to move things out of the m2m framework,
>>>> or move things into the m2m framework, or add a more codec-specific layer on top
>>>> of the m2m framework, or a combination of all of these.
>>>>
>>>> It is something that needs experimentation, just see what works.
>>>
>>> I can see you have omitted mark_stopped() calles when refactoring, which makes
>>> these patches change the behaviour. Could be related.
>>
>> Could be. I hope to be able to spend a bit of time on this today.
>>
>>>
>>> This is no longer strictly related to this patch, but I think cmd_stop()
>>> implementation (even after your changes) are miss-fit for driver that speaks to
>>> firmware. As the firmware is being made aware of the free buffers, you can't
>>> just cherry-pick from the capture queue, you have to synchronise your state with
>>> the firmware while draining. The helper should be split in two parts I suppose,
>>> but cutting the line isn't easy.
>>>
>>> Thread safe usage of the numerous boolean implicated in the draining state is
>>> also difficult. There is no other option then introduce a mutex or spinlock (if
>>> the state is needed in job_ready() implementation) to make this thread safe and
>>> reliable.
>>
>> Regards,
>>
>> Hans
>>
>>>
>>>>
>>>> But for this specific flag: I think it is fine to put that in the m2m framework,
>>>> just document and name it well.
>>>
>>> Ack.
>>>
>>>>
>>>> Regards,
>>>>
>>>> Hans
>>>>
>>>>>
>>>>> regards,
>>>>> Nicolas
>>>>>
>>>>>>
>>>>>>> *
>>>>>>> * Queue for buffers ready to be processed as soon as this
>>>>>>> * instance receives access to the device.
>>>>>>> @@ -69,6 +79,7 @@ struct v4l2_m2m_queue_ctx {
>>>>>>> spinlock_t rdy_spinlock;
>>>>>>> u8 num_rdy;
>>>>>>> bool buffered;
>>>>>>> + bool ignore_streaming;
>>>>>>> };
>>>>>>>
>>>>>>> /**
>>>>>>> @@ -564,6 +575,12 @@ static inline void v4l2_m2m_set_dst_buffered(struct v4l2_m2m_ctx *m2m_ctx,
>>>>>>> m2m_ctx->cap_q_ctx.buffered = buffered;
>>>>>>> }
>>>>>>>
>>>>>>> +static inline void v4l2_m2m_set_dst_ignore_streaming(struct v4l2_m2m_ctx *m2m_ctx,
>>>>>>> + bool ignore_streaming)
>>>>>>> +{
>>>>>>> + m2m_ctx->cap_q_ctx.ignore_streaming = ignore_streaming;
>>>>>>> +}
>>>>>>> +
>>>>>>
>>>>>> I think this is overkill, esp. when the field is moved to m2m_ctx. Just clearly
>>>>>> document that drivers can set this.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Hans
>>>>>>
>>>>>>> /**
>>>>>>> * v4l2_m2m_ctx_release() - release m2m context
>>>>>>> *
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>