Re: [RFC v2 6/8] media: uapi: Remove bit_size field from v4l2_ctrl_hevc_slice_params

From: John Cox
Date: Wed Feb 16 2022 - 05:38:11 EST


On Tue, 15 Feb 2022 21:27:30 +0100, you wrote:

>Dne torek, 15. februar 2022 ob 17:31:28 CET je John Cox napisal(a):
>> On Tue, 15 Feb 2022 17:11:12 +0100, you wrote:
>>
>> >Dne torek, 15. februar 2022 ob 17:00:33 CET je John Cox napisal(a):
>> >> On Tue, 15 Feb 2022 10:28:55 -0500, you wrote:
>> >>
>> >> >Le mardi 15 février 2022 à 14:50 +0000, John Cox a écrit :
>> >> >> On Tue, 15 Feb 2022 15:35:12 +0100, you wrote:
>> >> >>
>> >> >> >
>> >> >> > Le 15/02/2022 à 15:17, John Cox a écrit :
>> >> >> > > Hi
>> >> >> > >
>> >> >> > > > The bit size of the slice could be deduced from the buffer
>payload
>> >> >> > > > so remove bit_size field to avoid duplicated the information.
>> >> >> > > I think this is a bad idea. In the future we are (I hope) going to
>> >want
>> >> >> > > to have an array (variable) of slice headers all referring to the
>> >same
>> >> >> > > bit buffer. When we do that we will need this field.
>> >> >> >
>> >> >> > I wonder if that could be considering like another decode mode and
>so
>> >> >> > use an other control ?
>> >> >>
>> >> >> I, personally, would be in favour of making the slice header control a
>> >> >> variable array just as it is. If userland can't cope with multiple
>> >> >> entries then just send them one at a time and the code looks exactly
>> >> >> like it does at the moment and if the driver can't then set max array
>> >> >> entries to 1.
>> >> >>
>> >> >> Having implemented this in rpi port of ffmpeg and the RPi V4L2 driver I
>> >> >> can say with experience that the code and effort overhead is very low.
>> >> >>
>> >> >> Either way having a multiple slice header control in the UAPI is
>> >> >> important for efficiency.
>> >> >
>> >> >Just to clarify the idea, we would have a single slice controls, always
>> >dynamic:
>> >> >
>> >> >1. For sliced based decoder
>> >> >
>> >> >The dynamic array slice control is implemented by the driver and its
>size
>> >must
>> >> >be 1.
>> >>
>> >> Yes
>> >>
>> >> >2. For frame based decoder that don't care for slices
>> >> >
>> >> >The dynamic array slice controls is not implement. Userland detects that
>at
>> >> >runtime, similar to the VP9 compressed headers.
>> >>
>> >> If the driver parses all the slice header then that seems plausible
>> >>
>> >> >3. For frame based decoders that needs slices (or driver that supports
>> >offset
>> >> >and can gain performance with such mode)
>> >> >
>> >> >The dynamic array slice controls is implemented, and should contain all
>the
>> >> >slices found in the OUTPUT buffer.
>> >> >
>> >> >So the reason for this bit_size (not sure why its bits though, perhaps
>> >someone
>> >> >can educate me ?)
>> >>
>> >> RPi doesn't need bits and would be happy with bytes however
>> >> slice_segment data isn't byte aligned at the end so its possible that
>> >> there might be decoders out there that want an accurate length for that.
>> >
>> >There are two fields, please don't mix them up:
>> >
>> >__u32 bit_size;
>> >__u32 data_bit_offset; (changed to data_byte_offset in this series)
>> >
>> >data_bit_offset/data_byte_offset is useful, while bit_size is IMO not. If you
>> >have multiple slices in array, you only need to know start of the slice
>data
>> >and that offset is always offset from start of the buffer (absolute, it's not
>> >relative to previous slice data).
>>
>> No... or at least I think not. RPi needs the start and end of the
>> slice_segment_data elements of each slice.
>
>It would be good to know if size needs to be exact or can overshoot, like
>using end of buffer for that.
>
>Cedrus also wants to know slice data size, but it turns out that bigger than
>necessary size doesn't pose any problems. If that's not the case, then
>bit_size needs stay in for sure.

RPi needs the exact size (bytes will do - but I don't see the harm in
specifying it in bits in case some future h/w wants the extra precision
as long as we nail down exactly which bit we mean)

Regards

JC

>Best regards,
>Jernej
>
>> If slices are arranged in the
>> buffer with slice_segment_headers attached then I don't see how I get to
>> know that. Also if the OUTPUT buffer is just a bit of bitstream, which
>> might well be very convienient for some userspace, then it is legitimate
>> to have SEIs between slice headers so you can't even guarantee that your
>> coded slice segments are contiguous.
>>
>> Regards
>>
>> JC
>>
>> >Best regards,
>> >Jernej
>> >
>> >>
>> >> > Would be to let the driver offset inside the the single
>> >> >OUTPUT/bitstream buffer in case this is not automatically found by the
>> >driver
>> >> >(or that no start-code is needed). Is that last bit correct ? If so,
>should
>> >we
>> >> >change it to an offset rather then a size ? Shall we allow using offesets
>> >inside
>> >> >larger buffer (e.g. to avoid some memory copies) for the Sliced Base
>cases ?
>> >>
>> >> I use (in the current structure) data_bit_offset to find the start of
>> >> each slice's slice_segment_data within the OUTPUT buffer and bit_size to
>> >> find the end. RPi doesn't / can't parse the slice_header and so wants
>> >> all of that. Decoders that do parse the header might plausably want
>> >> header offsets too and it would facilitate zero copy of the bit buffer.
>> >>
>> >>
>> >> >> Regards
>> >> >>
>> >> >> John Cox
>> >> >>
>> >> >> > > > Signed-off-by: Benjamin Gaignard
><benjamin.gaignard@xxxxxxxxxxxxx>
>> >> >> > > > ---
>> >> >> > > > .../userspace-api/media/v4l/ext-ctrls-codec.rst | 3 ---
>> >> >> > > > drivers/staging/media/sunxi/cedrus/cedrus_h265.c | 11 +++
>> >+-------
>> >> >> > > > include/uapi/linux/v4l2-controls.h | 3 +--
>> >> >> > > > 3 files changed, 5 insertions(+), 12 deletions(-)
>> >> >> > > >
>> >> >> > > > diff --git a/Documentation/userspace-api/media/v4l/ext-ctrls-
>> >codec.rst b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
>> >> >> > > > index 3296ac3b9fca..c3ae97657fa7 100644
>> >> >> > > > --- a/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
>> >> >> > > > +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
>> >> >> > > > @@ -2965,9 +2965,6 @@ enum
>v4l2_mpeg_video_hevc_size_of_length_field
>> >-
>> >> >> > > > :stub-columns: 0
>> >> >> > > > :widths: 1 1 2
>> >> >> > > >
>> >> >> > > > - * - __u32
>> >> >> > > > - - ``bit_size``
>> >> >> > > > - - Size (in bits) of the current slice data.
>> >> >> > > > * - __u32
>> >> >> > > > - ``data_bit_offset``
>> >> >> > > > - Offset (in bits) to the video data in the current slice
>> >data.
>> >> >> > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c b/
>> >drivers/staging/media/sunxi/cedrus/cedrus_h265.c
>> >> >> > > > index 8ab2d9c6f048..db8c7475eeb8 100644
>> >> >> > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
>> >> >> > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
>> >> >> > > > @@ -312,8 +312,8 @@ static void cedrus_h265_setup(struct
>cedrus_ctx
>> >*ctx,
>> >> >> > > > const struct v4l2_hevc_pred_weight_table
>*pred_weight_table;
>> >> >> > > > unsigned int width_in_ctb_luma, ctb_size_luma;
>> >> >> > > > unsigned int log2_max_luma_coding_block_size;
>> >> >> > > > + size_t slice_bytes;
>> >> >> > > > dma_addr_t src_buf_addr;
>> >> >> > > > - dma_addr_t src_buf_end_addr;
>> >> >> > > > u32 chroma_log2_weight_denom;
>> >> >> > > > u32 output_pic_list_index;
>> >> >> > > > u32 pic_order_cnt[2];
>> >> >> > > > @@ -370,8 +370,8 @@ static void cedrus_h265_setup(struct
>cedrus_ctx
>> >*ctx,
>> >> >> > > >
>> >> >> > > > cedrus_write(dev, VE_DEC_H265_BITS_OFFSET, 0);
>> >> >> > > >
>> >> >> > > > - reg = slice_params->bit_size;
>> >> >> > > > - cedrus_write(dev, VE_DEC_H265_BITS_LEN, reg);
>> >> >> > > > + slice_bytes = vb2_get_plane_payload(&run->src-
>>vb2_buf, 0);
>> >> >> > > > + cedrus_write(dev, VE_DEC_H265_BITS_LEN, slice_bytes);
>> >> >> > > I think one of these must be wrong. bit_size is in bits,
>> >> >> > > vb2_get_plane_payload is in bytes?
>> >> >> >
>> >> >> > You are right it should be vb2_get_plane_payload() * 8 to get the
>size
>> >in bits.
>> >> >> >
>> >> >> > I will change that in v3.
>> >> >> >
>> >> >> > >
>> >> >> > > Regards
>> >> >> > >
>> >> >> > > John Cox
>> >> >> > >
>> >> >> > > > /* Source beginning and end addresses. */
>> >> >> > > >
>> >> >> > > > @@ -384,10 +384,7 @@ static void cedrus_h265_setup(struct
>> >cedrus_ctx *ctx,
>> >> >> > > >
>> >> >> > > > cedrus_write(dev, VE_DEC_H265_BITS_ADDR, reg);
>> >> >> > > >
>> >> >> > > > - src_buf_end_addr = src_buf_addr +
>> >> >> > > > - DIV_ROUND_UP(slice_params-
>>bit_size,
>> >8);
>> >> >> > > > -
>> >> >> > > > - reg =
>VE_DEC_H265_BITS_END_ADDR_BASE(src_buf_end_addr);
>> >> >> > > > + reg = VE_DEC_H265_BITS_END_ADDR_BASE(src_buf_addr +
>slice_bytes);
>> >> >> > > > cedrus_write(dev, VE_DEC_H265_BITS_END_ADDR, reg);
>> >> >> > > >
>> >> >> > > > /* Coding tree block address */
>> >> >> > > > diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/
>> >linux/v4l2-controls.h
>> >> >> > > > index b1a3dc05f02f..27f5d272dc43 100644
>> >> >> > > > --- a/include/uapi/linux/v4l2-controls.h
>> >> >> > > > +++ b/include/uapi/linux/v4l2-controls.h
>> >> >> > > > @@ -2457,7 +2457,6 @@ struct v4l2_hevc_pred_weight_table {
>> >> >> > > > #define V4L2_HEVC_SLICE_PARAMS_FLAG_DEPENDENT_SLICE_SEGMENT
>> >(1ULL << 9)
>> >> >> > > >
>> >> >> > > > struct v4l2_ctrl_hevc_slice_params {
>> >> >> > > > - __u32 bit_size;
>> >> >> > > > __u32 data_bit_offset;
>> >> >> > > >
>> >> >> > > > /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header
>*/
>> >> >> > > > @@ -2484,7 +2483,7 @@ struct v4l2_ctrl_hevc_slice_params {
>> >> >> > > > /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing
>SEI message
>> >*/
>> >> >> > > > __u8 pic_struct;
>> >> >> > > >
>> >> >> > > > - __u8 reserved;
>> >> >> > > > + __u8 reserved[5];
>> >> >> > > >
>> >> >> > > > /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice
>segment
>> >header */
>> >> >> > > > __u32 slice_segment_addr;
>> >>
>> >
>>
>