Re: [PATCH] drm/mgag200: Flush the cache to improve latency

From: Thomas Zimmermann
Date: Fri Oct 20 2023 - 08:06:36 EST


(cc'ing lkml for feedback)

Hi Jocelyn

Am 19.10.23 um 15:55 schrieb Jocelyn Falempe:
We found a regression in v5.10 on real-time server, using the
rt-kernel and the mgag200 driver. It's some really specialized
workload, with <10us latency expectation on isolated core.
After the v5.10, the real time tasks missed their <10us latency
when something prints on the screen (fbcon or printk)

I'd like to hear the opinion of the RT-devs on this patch. Because AFAIK we never did such a workaround in other drivers. And AFAIK printk is a PITA anyway.

IMHO if that RT system cannot handle differences in framebuffer caching, it's under-powered. It's just a matter of time until something else changes and the problem returns. And (honest question) as it's an x86-64, how do they handle System Management Mode?


The regression has been bisected to 2 commits:
0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")

The first one changed the system memory framebuffer from Write-Combine
to the default caching.
Before the second commit, the mgag200 driver used to unmap the
framebuffer after each frame, which implicitly does a cache flush.
Both regressions are fixed by the following patch, which forces a
cache flush after each frame, reverting to almost v5.9 behavior.

With that second commit, we essentially never unmap an active framebuffer console. But with commit

359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access with vmap")

we now again unmap the console framebuffer after the pageflip happened.

So how does the latest kernel behave wrt to the problem?

This is necessary only if you have strong realtime constraints, so I
put the cache flush under the CONFIG_PREEMPT_RT config flag.
Also clflush is only availabe on x86, (and this issue has only been
reproduced on x86_64) so it's also under the CONFIG_X86 config flag.

Fixes: 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
Fixes: 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
Signed-off-by: Jocelyn Falempe <jfalempe@xxxxxxxxxx>
---
drivers/gpu/drm/mgag200/mgag200_mode.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
index af3ce5a6a636..11660cd29cea 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -13,6 +13,7 @@
#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
+#include <drm/drm_cache.h>
#include <drm/drm_damage_helper.h>
#include <drm/drm_format_helper.h>
#include <drm/drm_fourcc.h>
@@ -436,6 +437,10 @@ static void mgag200_handle_damage(struct mga_device *mdev, const struct iosys_ma
iosys_map_incr(&dst, drm_fb_clip_offset(fb->pitches[0], fb->format, clip));
drm_fb_memcpy(&dst, fb->pitches, vmap, fb, clip);
+ /* On RT systems, flushing the cache reduces the latency for other RT tasks */
+#if defined(CONFIG_X86) && defined(CONFIG_PREEMPT_RT)
+ drm_clflush_virt_range(vmap, fb->height * fb->pitches[0]);
+#endif

Your second commit is part of a larger patchset that updates several drivers. They might all be affected. So if anything, the patch should go here before the unmap call:

https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_gem_atomic_helper.c#L377

with a much expanded comment.

But I'd really like to hear other peoples' opinions on the matter.

Best regards
Thomas

}
/*

base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature