Re: [PATCH v4] arm64/fpsimd: Suppress SVE access traps when loading FPSIMD state

From: Catalin Marinas
Date: Wed Mar 06 2024 - 13:55:00 EST


On Mon, Jan 22, 2024 at 07:42:14PM +0000, Mark Brown wrote:
> This indicates that there should be some useful benefit from reducing the
> number of SVE access traps for blocking system calls like we did for non
> blocking system calls in commit 8c845e273104 ("arm64/sve: Leave SVE enabled
> on syscall if we don't context switch"). Let's do this by counting the
> number of times we have loaded FPSIMD only register state for SVE tasks
> and only disabling traps after some number of times, otherwise leaving
> traps disabled and flushing the non-shared register state like we would on
> trap.

It looks like some people complain about SVE being disabled, though I
assume this is for kernels prior to 6.2 and commit 8c845e273104
("arm64/sve: Leave SVE enabled on syscall if we don't context switch"):

https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1999551/comments/52

I assume we may see the other camp complaining about the additional
state saving on context switch.

Anyway, I don't see why we should treat blocking syscalls differently
from non-blocking ones (addressed by the commit above). I guess the
difference is that one goes through a context switch but, from a user
perspective, it's still a syscall. The SVE state is expected to be
discarded and there may be a preference for avoiding the subsequent
fault.

> I pulled 64 out of thin air for the number of flushes to do, there is
> doubtless room for tuning here. Ideally we would be able to tell if the
> task is actually using SVE but without using performance counters (which
> would be substantial work) we can't currently tell. I picked the number
> because so many of the tasks using SVE used it so frequently.

So I wonder whether we should make the timeout disabling behaviour the
same for both blocking and non-blocking syscalls. IOW, ignore the
context switching aspect. Every X number of returns, disable SVE
irrespective of whether it was context switched or not. Or, if the
number of returns has a variation in time, just use a jiffy (or other
time based figure), it would time out in the same way for all types of
workloads.

--
Catalin