Re: [PATCH 2/3] venus: Limit HFI sessions to the maximum supported

From: Alexandre Courbot
Date: Thu Nov 26 2020 - 21:13:25 EST


On Fri, Nov 27, 2020 at 7:42 AM Stanimir Varbanov
<stanimir.varbanov@xxxxxxxxxx> wrote:
>
>
>
> On 11/26/20 8:28 AM, Alexandre Courbot wrote:
> > On Wed, Nov 25, 2020 at 10:01 PM Stanimir Varbanov
> > <stanimir.varbanov@xxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 11/25/20 5:46 AM, Alexandre Courbot wrote:
> >>> On Fri, Nov 20, 2020 at 9:12 AM Stanimir Varbanov
> >>> <stanimir.varbanov@xxxxxxxxxx> wrote:
> >>>>
> >>>> Currently we rely on firmware to return error when we reach the maximum
> >>>> supported number of sessions. But this errors are happened at reqbuf
> >>>> time which is a bit later. The more reasonable way looks like is to
> >>>> return the error on driver open.
> >>>>
> >>>> To achieve that modify hfi_session_create to return error when we reach
> >>>> maximum count of sessions and thus refuse open.
> >>>>
> >>>> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@xxxxxxxxxx>
> >>>> ---
> >>>> drivers/media/platform/qcom/venus/core.h | 1 +
> >>>> drivers/media/platform/qcom/venus/hfi.c | 19 +++++++++++++++----
> >>>> .../media/platform/qcom/venus/hfi_parser.c | 3 +++
> >>>> 3 files changed, 19 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/media/platform/qcom/venus/core.h b/drivers/media/platform/qcom/venus/core.h
> >>>> index db0e6738281e..3a477fcdd3a8 100644
> >>>> --- a/drivers/media/platform/qcom/venus/core.h
> >>>> +++ b/drivers/media/platform/qcom/venus/core.h
> >>>> @@ -96,6 +96,7 @@ struct venus_format {
> >>>> #define MAX_CAP_ENTRIES 32
> >>>> #define MAX_ALLOC_MODE_ENTRIES 16
> >>>> #define MAX_CODEC_NUM 32
> >>>> +#define MAX_SESSIONS 16
> >>>>
> >>>> struct raw_formats {
> >>>> u32 buftype;
> >>>> diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
> >>>> index 638ed5cfe05e..8420be6d3991 100644
> >>>> --- a/drivers/media/platform/qcom/venus/hfi.c
> >>>> +++ b/drivers/media/platform/qcom/venus/hfi.c
> >>>> @@ -175,6 +175,7 @@ static int wait_session_msg(struct venus_inst *inst)
> >>>> int hfi_session_create(struct venus_inst *inst, const struct hfi_inst_ops *ops)
> >>>> {
> >>>> struct venus_core *core = inst->core;
> >>>> + int ret;
> >>>>
> >>>> if (!ops)
> >>>> return -EINVAL;
> >>>> @@ -183,12 +184,22 @@ int hfi_session_create(struct venus_inst *inst, const struct hfi_inst_ops *ops)
> >>>> init_completion(&inst->done);
> >>>> inst->ops = ops;
> >>>>
> >>>> - mutex_lock(&core->lock);
> >>>> - list_add_tail(&inst->list, &core->instances);
> >>>> - atomic_inc(&core->insts_count);
> >>>> + ret = mutex_lock_interruptible(&core->lock);
> >>>> + if (ret)
> >>>> + return ret;
> >>>
> >>> Why do we change to mutex_lock_interruptible() here? This makes this
> >>
> >> Because mutex_lock_interruptible is preferable in kernel docs, but I
> >> agree that changing mutex_lock with mutex_lock_interruptible should be
> >> subject of another lock related patches. I will drop this in next patch
> >> version.
> >>
> >>> function return an error even though we could obtain the lock just by
> >>> trying a bit harder.
> >>
> >> I didn't get that. The behavior of mutex_lock_interruptible is that same
> >> as mutex_lock, i.e. the it will sleep to acquire the lock. The
> >> difference is that the sleep could be interrupted by a signal. You might
> >> think about mutex_trylock?
> >
> > Unless that mutex can be held by someone else for a rather long time
> > (i.e. to the point where we may want to give priority to signals when
> > userspace opens the device, since that's where hfi_session_create() is
> > called), I am not convinced this change is necessary? It may confuse
>
> Exactly, if there is a case where the core->lock is taken (firmware
> recovery) and it is not unlocked for very long time (deadlock?) then
> client process cannot be interrupted with a signal.
>
> > userspace into thinking there was a serious error while there is none.
>
> The client should be able to handle EINTR, right?
>
> > Granted, userspace should manage this case, and from what I can see
> > this code is correct, but I'm not sure we would gain anything by
> > adding this extra complexity.
>
> The benefit is that if something wrong is happening in the driver the
> client process will be killable.

Ack, that definitely makes sense in that context, even though it
should probably be done separately from this patch series. :)

Cheers,
Alex.