Re: [PATCH v7 45/49] media: core: Add bitmap manage bufs array entries

From: Hans Verkuil
Date: Thu Sep 21 2023 - 14:11:11 EST


On 21/09/2023 14:46, Benjamin Gaignard wrote:
>
> Le 21/09/2023 à 14:13, Hans Verkuil a écrit :
>> On 21/09/2023 14:05, Benjamin Gaignard wrote:
>>> Le 21/09/2023 à 12:24, Hans Verkuil a écrit :
>>>> On 21/09/2023 11:28, Benjamin Gaignard wrote:
>>>>> Le 20/09/2023 à 16:56, Hans Verkuil a écrit :
>>>>>> On 20/09/2023 16:30, Benjamin Gaignard wrote:
>>>>>> <snip>
>>>>>>
>>>>>>>>>          num_buffers = min_t(unsigned int, num_buffers,
>>>>>>>>>                      q->max_allowed_buffers - vb2_get_num_buffers(q));
>>>>>>>>>      -    first_index = vb2_get_num_buffers(q);
>>>>>>>>> +    first_index = bitmap_find_next_zero_area(q->bufs_map, q->max_allowed_buffers,
>>>>>>>>> +                         0, num_buffers, 0);
>>>>>>>>>            if (first_index >= q->max_allowed_buffers)
>>>>>>>>>              return 0;
>>>>>>>>> @@ -675,7 +678,13 @@ static void __vb2_queue_free(struct vb2_queue *q, unsigned int buffers)
>>>>>>>>>        struct vb2_buffer *vb2_get_buffer(struct vb2_queue *q, unsigned int index)
>>>>>>>>>      {
>>>>>>>>> -    if (index < q->num_buffers)
>>>>>>>>> +    if (!q->bufs_map || !q->bufs)
>>>>>>>>> +        return NULL;
>>>>>>>> I don't think this can ever happen.
>>>>>>> I got kernel crash without them.
>>>>>>> I will keep them.
>>>>>> What is the backtrace? How can this happen? It feels wrong that this can be
>>>>>> called with a vb2_queue that apparently is not properly initialized.
>>>>> I have this log when adding dump_stack() in vb2_get_buffer() if !q->bufs_bitmap:
>>>>>
>>>>> [   18.924627] Call trace:
>>>>> [   18.927090]  dump_backtrace+0x94/0xec
>>>>> [   18.930787]  show_stack+0x18/0x24
>>>>> [   18.934137]  dump_stack_lvl+0x48/0x60
>>>>> [   18.937833]  dump_stack+0x18/0x24
>>>>> [   18.941166]  __vb2_queue_cancel+0x23c/0x2f0
>>>>> [   18.945365]  vb2_core_queue_release+0x24/0x6c
>>>>> [   18.949740]  vb2_queue_release+0x10/0x1c
>>>>> [   18.953677]  v4l2_m2m_ctx_release+0x20/0x40
>>>>> [   18.957892]  hantro_release+0x20/0x54
>>>>> [   18.961584]  v4l2_release+0x74/0xec
>>>>> [   18.965110]  __fput+0xb4/0x274
>>>>> [   18.968205]  __fput_sync+0x50/0x5c
>>>>> [   18.971626]  __arm64_sys_close+0x38/0x7c
>>>>> [   18.975562]  invoke_syscall+0x48/0x114
>>>>> [   18.979329]  el0_svc_common.constprop.0+0xc0/0xe0
>>>>> [   18.984068]  do_el0_svc+0x1c/0x28
>>>>> [   18.987402]  el0_svc+0x40/0xe8
>>>>> [   18.990470]  el0t_64_sync_handler+0x100/0x12c
>>>>> [   18.994842]  el0t_64_sync+0x190/0x194
>>>>>
>>>>> This happen at boot time when hantro driver is open and close without other actions.
>>>> Ah, now I see the problem. q->bufs and q->bufs_map are allocated in
>>>> vb2_core_create_bufs and vb2_core_reqbufs, but they should be allocated
>>>> in vb2_queue_init: that's the counterpart of vb2_core_queue_release.
>>>>
>>>> With that change you shouldn't have to check for q->bufs/bufs_map anymore.
>>> It is a better solution but even like this vb2_core_queue_release() is called
>>> at least 2 times on the same vivid queue and without testing q->bufs_bitmap
>>> makes kernel crash.
>> Do you have a stacktrace for that? Perhaps vb2_core_queue_release should check
>> for q->bufs/q->bufs_map and return if those are NULL. But it could also be a
>> bug that it is called twice, it just was never noticed because it was harmless
>> before.
>
> I have added some printk to log that when running test-media on vivid:
>
> [  130.497426] vb2_core_queue_init queue cap-0000000050d195ab allocate q->bufs 00000000dc2c15ed and q->bufs_bitmap 000000008173fc5a
> ...
> [  130.733967] vb2_core_queue_release queue cap-0000000050d195ab release q->bufs and q->bufs_bitmap
> [  133.866345] vb2_get_buffer queue cap-0000000050d195ab q->bufs_bitmap is NULL
> [  133.873454] CPU: 1 PID: 321 Comm: v4l2-ctl Not tainted 6.6.0-rc1+ #542
> [  133.879997] Hardware name: NXP i.MX8MQ EVK (DT)
> [  133.884536] Call trace:
> [  133.886988]  dump_backtrace+0x94/0xec
> [  133.890673]  show_stack+0x18/0x24
> [  133.894002]  dump_stack_lvl+0x48/0x60
> [  133.897681]  dump_stack+0x18/0x24
> [  133.901009]  __vb2_queue_cancel+0x250/0x31c
> [  133.905209]  vb2_core_queue_release+0x24/0x88
> [  133.909580]  _vb2_fop_release+0xb0/0xbc
> [  133.913428]  vb2_fop_release+0x2c/0x58
> [  133.917187]  vivid_fop_release+0x80/0x388 [vivid]
> [  133.921948]  v4l2_release+0x74/0xec
> [  133.925452]  __fput+0xb4/0x274
> [  133.928520]  __fput_sync+0x50/0x5c
> [  133.931934]  __arm64_sys_close+0x38/0x7c
> [  133.935868]  invoke_syscall+0x48/0x114
> [  133.939630]  el0_svc_common.constprop.0+0x40/0xe0
> [  133.944349]  do_el0_svc+0x1c/0x28
> [  133.947677]  el0_svc+0x40/0xe8
> [  133.950741]  el0t_64_sync_handler+0x100/0x12c
> [  133.955109]  el0t_64_sync+0x190/0x194
>
> and later I have a call to reqbufs on the same queue without call to vb2_core_queue_init before
>
> [   58.696812] __vb2_queue_alloc queue cap- 0000000050d195abq->bufs_bitmap is NULL
> [   58.704148] CPU: 1 PID: 319 Comm: v4l2-compliance Not tainted 6.6.0-rc1+ #544
> [   58.711291] Hardware name: NXP i.MX8MQ EVK (DT)
> [   58.715826] Call trace:
> [   58.718274]  dump_backtrace+0x94/0xec
> [   58.721951]  show_stack+0x18/0x24
> [   58.725274]  dump_stack_lvl+0x48/0x60
> [   58.728946]  dump_stack+0x18/0x24
> [   58.732268]  __vb2_queue_alloc+0x4a8/0x50c
> [   58.736374]  vb2_core_reqbufs+0x274/0x46c
> [   58.740391]  vb2_ioctl_reqbufs+0xb0/0xe8
> [   58.744320]  vidioc_reqbufs+0x50/0x64 [vivid]
> [   58.748717]  v4l_reqbufs+0x50/0x64
> [   58.752125]  __video_do_ioctl+0x164/0x3c8
> [   58.756140]  video_usercopy+0x200/0x668
> [   58.759982]  video_ioctl2+0x18/0x28
> [   58.763475]  v4l2_ioctl+0x40/0x60
> [   58.766798]  __arm64_sys_ioctl+0xac/0xf0
> [   58.770730]  invoke_syscall+0x48/0x114
> [   58.774487]  el0_svc_common.constprop.0+0x40/0xe0
> [   58.779199]  do_el0_svc+0x1c/0x28
> [   58.782520]  el0_svc+0x40/0xe8
> [   58.785580]  el0t_64_sync_handler+0x100/0x12c
> [   58.789942]  el0t_64_sync+0x190/0x194

Argh, I see what is happening. The root cause is that vb2_core_queue_release
is actually not a true counterpart to vb2_core_queue_init.

The '_release' part refers to when a file handle is released, and not to
releasing resources allocated in queue_init.

The queue_init function never actually allocated any resources, so there
was never a reason to make a counterpart to that, but now that bites us.

Changing this would be a huge amount of work, and it is not worth the
effort, IMHO.

But at least we shouldn't have to test for both bufs and bufs_map,
they are either both set or both NULL. Just test one of the two.

The vb2_core_queue_init() function documentation in the header
should perhaps be more clear about the fact that this function
does not allocate any resources, and that there is no cleanup
counterpart.

It is what got me confused...

Regards,

Hans

>
>>
>> Regards,
>>
>>     Hans
>>
>>>> Regards,
>>>>
>>>>      Hans
>>>>
>>>>>    
>>>>>>>>> +
>>>>>>>>> +    return (bitmap_weight(q->bufs_map, q->max_allowed_buffers) > 0);
>>>>>>>> How about:
>>>>>>>>
>>>>>>>>        return vb2_get_num_buffers(q) > 0;
>>>>>>> vb2_get_num_buffers is defined in videobuf2-core.c, I'm not sure that
>>>>>>> an inline function could depend of a module function.
>>>>>> Not a problem. E.g. v4l2-ctrls.h is full of such static inlines.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>       Hans
>>>>>>
>>