V4L2 Encoders Pre-Processing Support Questions

From: Paul Kocialkowski
Date: Thu Oct 19 2023 - 05:39:34 EST


Hello,

While working on the Allwinner Video Engine H.264 encoder, I found that it has
some pre-processing capabilities. This includes things like chroma
down-sampling, colorspace conversion and scaling.

For example this means that you can feed the encoder with YUV 4:2:2 data and
it will downsample it to 4:2:0 since that's the only thing the hardware can do.
It can also happen when e.g. providing RGB source pictures which will be
converted to YUV 4:2:0 internally.

I was wondering how all of this is dealt with currently and whether this should
be a topic of attention. As far as I can see there is currently no practical way
for userspace to know that such downsampling will take place, although this is
useful to know.

Would it make sense to have an additional media entity between the source video
node and the encoder proc and have the actual pixel format configured in that
link (this would still be a video-centric device so userspace would not be
expected to configure that link). But then what if the hardware can either
down-sample or keep the provided sub-sampling? How would userspace indicate
which behavior to select? It is maybe not great to let userspace configure the
pads when this is a video-node-centric driver.

Perhaps this could be a control or the driver could decide to pick the least
destructive sub-sampling available based on the selected codec profile
(but this is still a guess that may not match the use case). With a control
we probably don't need an extra media entity.

Another topic is scaling. We can generally support scaling by allowing a
different size for the coded queue after configuring the picture queue.
However there would be some interaction with the selection rectangle, which is
used to set the cropping rectangle from the *source*. So the driver will need
to take this rectangle and scale it to match with the coded size.

The main inconsistency here is that the rectangle would no longer correspond to
what will be set in the bitstream, nor would the destination size since it does
not count the cropping rectangle by definition. It might be more sensible to
have the selection rectangle operate on the coded/destination queue instead,
but things are already specified to be the other way round.

Maybe a selection rectangle could be introduced for the coded queue too, which
would generally be propagated from the picture-side one, except in the case of
scaling where it would be used to clarify the actual final size (coded size
taking the cropping in account). It this case the source selection rectangle
would be understood as an actual source crop (may not be supported by hardware)
instead of an indication for the codec metadata crop fields. And the coded
queue dimensions would need to take in account this source cropping, which is
kinda contradictory with the current semantics. Perhaps we could define that
the source crop rectangle should be entirely ignored when scaling is used,
which would simplify things (although we lose the ability to support source
cropping if the hardware can do it).

If operating on the source selection rectangle only (no second rectangle on the
coded queue) some cases would be impossible to reach, for instance going from
some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
1920x1088 and we want the codec cropping fields to indicate 1920x1080).

Anyway just wanted to check if people have already thought about these topics,
but I'm mostly thinking out loud and I'm of course not saying we need to solve
these problems now.

Sorry again for the long email, I hope the points I'm making are somewhat
understandable.

Cheers,

Paul

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Attachment: signature.asc
Description: PGP signature