Re: [PATCH v2 6/9] drm/vkms: Add YUV support

From: Pekka Paalanen
Date: Fri Mar 01 2024 - 06:53:47 EST


On Thu, 29 Feb 2024 14:57:06 -0300
Arthur Grillo <arthurgrillo@xxxxxxxxxx> wrote:

> On 29/02/24 09:12, Pekka Paalanen wrote:
> > On Wed, 28 Feb 2024 22:52:09 -0300
> > Arthur Grillo <arthurgrillo@xxxxxxxxxx> wrote:
> >
> >> On 27/02/24 17:01, Arthur Grillo wrote:
> >>>
> >>>
> >>> On 27/02/24 12:02, Louis Chauvet wrote:
> >>>> Hi Pekka,
> >>>>
> >>>> For all the comment related to the conversion part, maybe Arthur have an
> >>>> opinion on it, I took his patch as a "black box" (I did not want to
> >>>> break (and debug) it).
> >>>>
> >>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
> >>>>> On Fri, 23 Feb 2024 12:37:26 +0100
> >>>>> Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote:
> >>>>>
> >>>>>> From: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
> >>>>>>
> >>>>>> Add support to the YUV formats bellow:
> >>>>>>
> >>>>>> - NV12
> >>>>>> - NV16
> >>>>>> - NV24
> >>>>>> - NV21
> >>>>>> - NV61
> >>>>>> - NV42
> >>>>>> - YUV420
> >>>>>> - YUV422
> >>>>>> - YUV444
> >>>>>> - YVU420
> >>>>>> - YVU422
> >>>>>> - YVU444
> >>>>>>
> >>>>>> The conversion matrices of each encoding and range were obtained by
> >>>>>> rounding the values of the original conversion matrices multiplied by
> >>>>>> 2^8. This is done to avoid the use of fixed point operations.
> >>>>>>
> >>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
> >>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> >>>>>> callbacks for yuv formats]
> >>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx>
> >>>>>> ---
> >>>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> >>>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> >>>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> >>>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> >>>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> >>>>>> 5 files changed, 295 insertions(+), 20 deletions(-)

..

> >> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> index f66584549827..4cee3c2d8d84 100644
> >> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
> >> {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> >> {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> >> {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> >> - {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >> + {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >> {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
> >> {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
> >> },
> >> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >> index e06bbd7c0a67..043f23dbf80d 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
> >> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> >> }
> >>
> >> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >> +#define BIT_DEPTH 32
> >> +
> >> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >> {
> >> - s32 y_16, cb_16, cr_16;
> >> - s32 r_16, g_16, b_16;
> >> + s64 y_16, cb_16, cr_16;
> >> + s64 r_16, g_16, b_16;
> >>
> >> y_16 = y - y_offset;
> >> cb_16 = cb - 128;
> >> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
> >> g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> >> b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> >>
> >> - *r = clamp(r_16, 0, 0xffff) >> 8;
> >> - *g = clamp(g_16, 0, 0xffff) >> 8;
> >> - *b = clamp(b_16, 0, 0xffff) >> 8;
> >> + // rounding the values
> >> + r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
> >> + g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
> >> + b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
> >> +
> >> + r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> + g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> + b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >
> > Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?
>
> Basically, the numbers are in this form in hex:
>
> 0xsspppppppp
>
> In the end, we only want the 's' bits.
>
> The matrix multiplication is not giving us perfect results, making some
> of KUnit test not pass, This is because the values end up a little bit
> off. KUnit expects 0xfe, but this functions is returning 0xfd.
>
> I noticed that before shifting the values to get the 's' bytes the
> values were a lot close to what is expected, something like:
>
> 0xfdfe287312
> ^
> So the rounding part adds 1 to this marked 'f' to round a bit the values
> (drm_fixed.h does something similar on drm_fixp2int_round).
> Like that:
>
> 0xfdfe287312
> + 0x0010000000
> ------------
> 0xfe0e287312
>
> That's why the BIT_DEPTH - 4.

I have a hard time deciphering this. There is some sort of strange
combination of UNORM and fixed-point going on here, where you process
the range 0.0 - 255.0 including 32-bit fraction. All variables being
named "_16" does not help, I've no idea what that refers to.

Usually when you have unsigned pixel format type, it's UNORM, that is
an unsigned integer representation that maps to [0.0, 1.0]. When
converting UNORM properly to e.g. fixed-point, you don't have to
consider the UNORM bit depth when computing in fixed-point.

There is a catch: since 0xff maps to 1.0, the divisor is 0xff, and not
a bit shift by 8. This must be taken into account when converting
between different depths of UNORM, or between UNORM and fixed-point.
Converting between different depths of fixed-point does not have this
problem.

If you want to proper rounding, meaning that 0.5 rounds up to 1.0 and
0.4999 rounds down to 0.0 when rounding to integers, you have to add
0.5 before truncating.

So in this case you need to add 0x0100_0000 / 2 = 0x0080_0000, not
0x0010_0000.

I don't understand what drm_fixp2int_round() is even doing. The offset
is not 0.5, it's 0.0000076.

> After that, the values need to be clamped to not get wrong results when
> shifting this s64 and casting it to u8. We clamp it to the minimum
> allowed value: 0, and to the maximum allowed value, which in this case
> is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
> the size of the 's' bits.

Ok. You could also shift with >> BIT_DEPTH first, and then clamp to 0,
255.


Thanks,
pq

> After writing this, I think that maybe it would be good to add this
> explanation as a comment on the code.
>
> >
> >> +
> >> + *r = r_16 >> BIT_DEPTH;
> >> + *g = g_16 >> BIT_DEPTH;
> >> + *b = b_16 >> BIT_DEPTH;
> >> }

Attachment: pgp0bAR4GFlwY.pgp
Description: OpenPGP digital signature