Re: [RFC 0/4] dma-fence: Deadline awareness

From: Christian König
Date: Thu Jul 29 2021 - 09:41:23 EST

Next message: Rafael J. Wysocki: "Re: [PATCH v2] PCI: PM: Add special case handling for PCIe device wakeup"
Previous message: Riccardo Mancini: "Re: [PATCH 3/3] perf test: Be more consistent in use of TEST_*"
In reply to: Pekka Paalanen: "Re: [RFC 0/4] dma-fence: Deadline awareness"
Next in thread: Pekka Paalanen: "Re: [RFC 0/4] dma-fence: Deadline awareness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Am 29.07.21 um 14:49 schrieb Pekka Paalanen:

On Thu, 29 Jul 2021 13:43:20 +0200
Christian König <christian.koenig@xxxxxxx> wrote:

Am 29.07.21 um 13:00 schrieb Pekka Paalanen:

On Thu, 29 Jul 2021 12:14:18 +0200
Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote:

Am 29.07.21 um 11:15 schrieb Pekka Paalanen:

If the app happens to be frozen (e.g. some weird bug in fence handling
to make it never ready, or maybe it's just bugged itself and never
drawing again), then the app is frozen, and all the rest of the desktop
continues running normally without a glitch.

But that is in contradict to what you told me before.

See when the window should move but fails to draw it's new content what
happens?

Are the other windows which would be affected by the move not drawn as well?

No, all the other windows will continue behaving normally just like
they always did. It's just that one frozen window there that won't
update; it won't resize, so there is no reason to move that other
window either.

Everything continues as if the frozen window never even sent anything
to the compositor after its last good update.

We have a principle in Wayland: the compositor cannot afford to wait
for clients, the desktop as a whole must remain responsive. So there is
always a backup plan even for cases where the compositor expects the
client to change something. For resizes, in a floating-window manager
it's easy: just let things continue as they are; in a tiling window
manager they may have a timeout after which... whatever is appropriate.

Another example: If a compositor decides to make a window maximized, it
tells the client the new size and state it must have. Until the client
acks that specific state change, the compositor will continue managing
that window as if nothing changed. Given the asynchronous nature of
Wayland, the client might even continue submitting updates
non-maximized for a while, and that will go through as if the
compositor didn't ask for maximized. But at some point the client acks
the window state change, and from that point on if it doesn't behave
like maximized state requires, it will get a protocol error and be
disconnected.

Yeah and all of this totally makes sense.

The problem is that not forwarding the state changes to the hardware
adds a CPU round trip which is rather bad for the driver design,
especially power management.

E.g. when you submit the work only after everybody becomes available the
GPU becomes idle in between and might think it is a good idea to reduce
clocks etc...

Everybody does not need to be available. The compositor can submit its
work anyway, it just uses old state for some of the windows.

But if everybody happens to be ready before the compositor repaints,
then the GPU will be idle anyway, whether the compositor looked at the
buffer readyness at all or not.

Ok good point.

Given that Wayland clients are not expected (but can if they want) to
draw again until the frame callback which ensures that their previous
frame is definitely going to be used on screen, this idling of GPU
might happen regularly with well-behaved clients I guess?

Maybe I wasn't clear what the problem is: That the GPU goes idle is expected, but it should it should just not go idle multiple times.

The aim is that co-operative clients never draw a frame that will only
get discarded.

How about doing this instead:

1. As soon as at least one window has new committed state you submit the
rendering.
As far as I understand it that is already the case anyway.

At least Weston does not work like that. Doing that means that the
first client to send a new frame will lock all other client updates out
of that update cycle.

Hence, a compositor usually waits until some point before the target
vblank before it starts the repaint, which locks the window state in
place for the frame.

Uff, that means we have lost this game anyway.

See you get the best energy utilization if the hardware wakes up as few as possible and still get everything done.

So what happens in the case you describes is that the hardware comes out of sleep at least twice, once for the client and once for the server which is rather sub optimal.

Any client update could contain window state changes that prevents the
GPU from choosing the content buffer to use.

2. Before starting rendering the hardware driver waits with a timeout
for all the window content to become ready.
The timeout is picked in a way so that we at least reach a
reasonable fps. Making that depending on the maximum refresh rate of the
display device sounds reasonable to me.

3a. If all windows become ready on time we draw the frame as expected.
3b. If a timeout occurs the compositor is noted of this and goes on a
fallback path rendering only the content known to be ready.

Sounds like the fallback path, where the compositor's rendering is
already late, would need to re-do all the rendering with an extremely
tight schedule just before the KMS submission deadline. IOW, when
you're not going to make it in time, you have to do even more work and
ping-pong even more between CPU and GPU after being a bit late already.
Is that really a good idea?

My idea is that both the fallback path and the normal rendering are submitted at the same time, just with a big if/then/else around it. E.g. the timeout happens on the GPU hardware and not on the CPU.

But I think that stuff is just to complicated to implement.

I want to describe once more what the ideal configuration would be:
1. When you render a frame one or more clients submit jobs to the hardware.
2. Those jobs then execute on the hardware asynchronously to the CPU.
3. At the same time the CPU prepares a composition job which takes all the window content from clients and renders a new frame.
4. This new frame gets submitted to the hardware driver as new content on the screen.
5. The hardware driver waits for all the rendering to be completed and flips the screen.

The idea is that you have only one block of activity on the hardware, e.g. something like this:
_------------_______flip_-------------_____flip.....

But what happens with Wayland currently is that you end up with:
_--------_______-__flip_------------___-__flip.....

Or even worse when you have multiple clients rendering at random times:
_---_---_---____-__flip_---_---_---___-__flip.....

I'm actually not that of a power management guy, but it is rather obvious that this is not optimal.

Regards,
Christian.

It also means the compositor cannot submit the KMS atomic commit until
the GPU is done or timed out the compositing job, which is another
GPU-CPU ping-pong.

4. Repeat.

This way we should be able to handle all use cases gracefully, e.g. the
hanging client won't cause the server to block and when everything
becomes ready on time we just render as expected.

Thanks,
pq

Next message: Rafael J. Wysocki: "Re: [PATCH v2] PCI: PM: Add special case handling for PCIe device wakeup"
Previous message: Riccardo Mancini: "Re: [PATCH 3/3] perf test: Be more consistent in use of TEST_*"
In reply to: Pekka Paalanen: "Re: [RFC 0/4] dma-fence: Deadline awareness"
Next in thread: Pekka Paalanen: "Re: [RFC 0/4] dma-fence: Deadline awareness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]