Re: [PATCH v2 6/9] drm/vkms: Add YUV support

From: Louis Chauvet
Date: Mon Mar 04 2024 - 10:29:28 EST


Le 29/02/24 - 14:12, Pekka Paalanen a écrit :
> On Wed, 28 Feb 2024 22:52:09 -0300
> Arthur Grillo <arthurgrillo@xxxxxxxxxx> wrote:
>
> > On 27/02/24 17:01, Arthur Grillo wrote:
> > >
> > >
> > > On 27/02/24 12:02, Louis Chauvet wrote:
> > >> Hi Pekka,
> > >>
> > >> For all the comment related to the conversion part, maybe Arthur have an
> > >> opinion on it, I took his patch as a "black box" (I did not want to
> > >> break (and debug) it).
> > >>
> > >> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
> > >>> On Fri, 23 Feb 2024 12:37:26 +0100
> > >>> Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote:
> > >>>
> > >>>> From: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
> > >>>>
> > >>>> Add support to the YUV formats bellow:
> > >>>>
> > >>>> - NV12
> > >>>> - NV16
> > >>>> - NV24
> > >>>> - NV21
> > >>>> - NV61
> > >>>> - NV42
> > >>>> - YUV420
> > >>>> - YUV422
> > >>>> - YUV444
> > >>>> - YVU420
> > >>>> - YVU422
> > >>>> - YVU444
> > >>>>
> > >>>> The conversion matrices of each encoding and range were obtained by
> > >>>> rounding the values of the original conversion matrices multiplied by
> > >>>> 2^8. This is done to avoid the use of fixed point operations.
> > >>>>
> > >>>> Signed-off-by: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
> > >>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> > >>>> callbacks for yuv formats]
> > >>>> Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx>
> > >>>> ---
> > >>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> > >>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> > >>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> > >>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> > >>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> > >>>> 5 files changed, 295 insertions(+), 20 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> index e555bf9c1aee..54fc5161d565 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> > >>>> * buffer [1]
> > >>>> */
> > >>>> current_plane->pixel_read_line(
> > >>>> - current_plane->frame_info,
> > >>>> + current_plane,
> > >>>> x_start,
> > >>>> y_start,
> > >>>> direction,
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> index ccc5be009f15..a4f6456cb971 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> > >>>> READ_RIGHT
> > >>>> };
> > >>>>
> > >>>> +struct vkms_plane_state;
> > >>>> +
> > >>>> /**
> > >>>> <<<<<<< HEAD
> > >>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > >>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> > >>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> > >>>> * x_end.
> > >>>> */
> > >>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> > >>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > >>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> > >>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > >>>
> > >>> This is the second or third time in this one series changing this type.
> > >>> Could you not do the change once, in its own patch if possible?
> > >>
> > >> Sorry, this is not a change here, but a wrong formatting (missed when
> > >> rebasing).
> > >>
> > >> Do you think that it make sense to re-order my patches and put this
> > >> typedef at the end? This way it is never updated.
>
> I'm not sure, I haven't checked how it would change your patches. The
> intermediate changes might get a lot uglier?
>
> Just try to fold changes so that you don't need to change something
> twice over the series unless there is a good reason to. "How hard would
> it be to review this?" is my measure stick.

It will not be uglier, it was just the order I did things. I first cleaned
the code and created this typedef (PATCHv2 4/9), and then rewrote the
composition, for which I had to change the typedef.

I also wanted to make my series easy to understand and make clear what is
my "main contribution" and what are "quality stuff, not related to my
contribution":
- Prepare things (document existing state, format, typedef)
- Big change (and update related doc, typedef)
- Rebase some other stuff on my big change (YUV)

So yes, some parts are changed twice in preparation step and the "big
change".

>
> > >>
> > >>>>
> > >>>> /**
> > >>>> * vkms_plane_state - Driver specific plane state
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> index 46daea6d3ee9..515c80866a58 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> > >>>> */
> > >>>> return fb->offsets[plane_index] +
> > >>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > >>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> > >>>> + (x / drm_format_info_block_height(format, plane_index)) *
> > >>>> + format->char_per_block[plane_index];
> > >>>
> > >>> Shouldn't this be in the patch that added this code in the first place?
> > >>
> > >> Same as above, a wrong formatting, I will remove this change and keep
> > >> everything on one line (even if it's more than 100 chars, it is easier to
> > >> read).
>
> Personally I agree that readability is more important than strict line
> length limits. I'm not sure how the kernel rolls there.
>
> > >>
> > >>>> }
> > >>>>
> > >>>> /**
> > >>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> > >>>> }
> > >>>> }
> > >>>>
> > >>>> +/**
> > >>>> + * get_subsampling() - Get the subsampling value on a specific direction
> > >>>
> > >>> subsampling divisor
> > >>
> > >> Thanks for this precision.
> > >>
> > >>>> + */
> > >>>> +static int get_subsampling(const struct drm_format_info *format,
> > >>>> + enum pixel_read_direction direction)
> > >>>> +{
> > >>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
> > >>>> + return format->hsub;
> > >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> + return format->vsub;
> > >>>> + return 1;
> > >>>
> > >>> In this and the below function, personally I'd prefer switch-case, with
> > >>> a cannot-happen-scream after the switch, so the compiler can warn about
> > >>> unhandled enum values.
> > >>
> > >> As for the previous patch, I did not know about this compiler feature,
> > >> thanks!
> > >>
> > >>>> +}
> > >>>> +
> > >>>> +/**
> > >>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> > >>>> + */
> > >>>> +static int get_subsampling_offset(const struct drm_format_info *format,
> > >>>> + enum pixel_read_direction direction, int x_start, int y_start)
> > >>>
> > >>> 'start' values as "increments" for a pixel counter? Is something
> > >>> misnamed here?
> > >>>
> > >>> Is it an increment or an offset?
> > >>
> > >> I don't really know how to name the function. I'm open to suggestions
> > >> x_start and y_start are really the coordinate of the starting reading point.
>
> I looks like it's an offset, so "offset" and "start" are good words.
> Then the only misleading piece is the doc:
>
> "Get the subsampling offset to use when incrementing the pixel counter"
>
> This sounds like the offset is used when incrementing a counter, that
> is, counter is increment by offset each time. That's my problem with
> this.
>
> Fix just the doc, and it's good, I think.
>
> > >>
> > >> To explain what it does:
> > >>
> > >> When using subsampling, you have to read the next pixel of planes[1..4]
> > >> not at the same "speed" as plane[0]. But I can't only rely on
> > >> "read_pixel_count % subsampling == 0", because it means that the pixel
> > >> incrementation on planes[1..4] may not be aligned with the buffer (if
> > >> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
> > >> for x=2,4,6... not 1,3,5...).
> > >>
> > >> A way to ensure this is to add an "offset" to count, which ensure that the
> > >> count % subsampling == 0 on the correct pixel.
>
> Yes, I think I did get that feeling from the code eventually somehow,
> but it wouldn't hurt to explain it in the comment.
>
> "An offset for keeping the chroma siting consistent regardless of
> x_start and y_start" maybe?

It is better yes, thanks!

> > >>
> > >> I made an error, the switch case must be (as count is always counting up,
> > >> for "inverted" reading direction a negative number ensure that
> > >> %subsampling == 0 on the correct pixel):
> > >>
> > >> switch (direction) {
> > >> case READ_UP:
> > >> return -y_start;
> > >> case READ_DOWN:
> > >> return y_start;
> > >> case READ_LEFT:
> > >> return -x_start;
> > >> case READ_RIGHT:
> > >> return x_start;
> > >> }
>
> Yes, the inverted reading directions are different indeed. I did not
> think through if this works also for sub-sampling divisors > 2 which I
> don't think are ever used.

I choosen those values because they should work with any sub-sampling
divisor.

hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.

>
> Does IGT find this mistake? If not, maybe IGT should be extended.

No, for two reasons:
- The original version works fine for NV12/16/24 and YUV with *sub <= 2
(x+n%2 == x-n%2). It only breaks for *sub > 2.
- YUV410/... are not supported by VKMS
- IGT does not test different colors for rotations/translations (at least
for the tests I tried). I will see if it's possible to add things in
kms_rotation_crc/kms_cursor_crc to test more colors format (at least
one RGB and one YUV).

> > >>
> > >>>> +{
> > >>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
> > >>>> + return x_start;
> > >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> + return y_start;
> > >>>> + return 0;
> > >>>> +}
> > >>>> +
> > >>
> > >> [...]
> > >>
> > >>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> > >>>> + enum drm_color_encoding encoding, enum drm_color_range range)
> > >>>> +{
> > >>>> + static const s16 bt601_full[3][3] = {
> > >>>> + { 256, 0, 359 },
> > >>>> + { 256, -88, -183 },
> > >>>> + { 256, 454, 0 },
> > >>>> + };
> > >>
> > >> [...]
> > >>
> > >>>> +
> > >>>> + u8 r = 0;
> > >>>> + u8 g = 0;
> > >>>> + u8 b = 0;
> > >>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> > >>>> + unsigned int y_offset = full ? 0 : 16;
> > >>>> +
> > >>>> + switch (encoding) {
> > >>>> + case DRM_COLOR_YCBCR_BT601:
> > >>>> + ycbcr2rgb(full ? bt601_full : bt601,
> > >>>
> > >>> Doing all these conditional again pixel by pixel is probably
> > >>> inefficient. Just like with the line reading functions, you could pick
> > >>> the matrix in advance.
> > >>
> > >> I don't think the performance impact is huge (it's only a pair of if), but
> > >> yes, it's an easy optimization.
> > >>
> > >> I will create a conversion_matrix structure:
> > >>
> > >> struct conversion_matrix {
> > >> s16 matrix[3][3];
> > >> u16 y_offset;
> > >> }
>
> When defining such a struct type, it would be good to document the
> matrix layout (which one is row, which one is column), and what the s16
> mean (fixed point?).

Ack

> Try to not mix signed and unsigned types, too. The C implicit type
> promotion rules can be surprising. Just make everything signed while
> computing, and convert to/from unsigned only for storage.

Ack, I will change to signed type.

> > >>
> > >> I will create a `get_conversion_matrix_to_argb_u16` function to get this
> > >> structure from a format+encoding+range.
> > >>
> > >> I will also add a field `conversion_matrix` in struct vkms_plane_state to
> > >> get this matrix only once per plane setup.
>
> Alright. Let's see how that works.
>
> > >>
> > >>
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + case DRM_COLOR_YCBCR_BT709:
> > >>>> + ycbcr2rgb(full ? rec709_full : rec709,
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + case DRM_COLOR_YCBCR_BT2020:
> > >>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + default:
> > >>>> + pr_warn_once("Not supported color encoding\n");
> > >>>> + break;
> > >>>> + }
> > >>>> +
> > >>>> + argb_u16->r = r * 257;
> > >>>> + argb_u16->g = g * 257;
> > >>>> + argb_u16->b = b * 257;
> > >>>
> > >>> I wonder. Using 8-bit fixed point precision seems quite coarse for
> > >>> 8-bit pixel formats, and it's going to be insufficient for higher bit
> > >>> depths. Was supporting e.g. 10-bit YUV considered? There is even
> > >>> deeper, too, like DRM_FORMAT_P016.
> > >>
> > >> It's a good point, as I explained above, I took the conversion part as a
> > >> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
> > >> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
> > >> the float part).
> > >>
> > >> Maybe Arthur have an opinion on this?
> > >
> > > Yeah, I too don't see why not we could do that. The 8-bit precision was
> > > sufficient for those formats, but as well noted by Pekka this could be a
> > > problem for higher bit depths. I just need to make my terrible python
> > > script spit those values XD.
> >
> > Finally, I got it working with 32-bit precision.
> >
> > I basically threw all my untrusted python code away, and started using
> > the colour python framework suggested by Sebastian[1]. After knowing the
> > right values (and staring at numbers for hours), I found that with a
> > little bit of rounding, the conversion works.
> >
> > Also, while at it, I changed the name rec709 to bt709 to follow the
> > pattern and added "_full" to the full ranges matrices.
> >
> > While using the library, I noticed that the red component is wrong on
> > the color red in one test case.
> >
> > [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>
> That all sounds good. I wish the kernel code contained comments
> explaining how exactly you computed those matrices with python/colour.
> If the python snippets are not too long, including them verbatim as
> code comments would be really nice for both reviewers and posterity.
>
> The same for the VKMS unit tests, too, how you got the expected result
> values.

I edited the YUV support to have those s64 values.

@arthur, I will submit a v4 with this:
- matrix selection in plane_atomic_update (so it's selected only once)
- s64 numbers for matrix
- avoiding multiple loop implementation by switching matrix columns

Regarding the YUV part, I don't feel confortable adressing Pekka's
comments, would you mind doing it?

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com