Re: [PATCH v4] arm64/fpsimd: Suppress SVE access traps when loading FPSIMD state

From: Mark Brown
Date: Wed Mar 06 2024 - 17:44:43 EST


On Wed, Mar 06, 2024 at 06:54:51PM +0000, Catalin Marinas wrote:
> On Mon, Jan 22, 2024 at 07:42:14PM +0000, Mark Brown wrote:

> > This indicates that there should be some useful benefit from reducing the
> > number of SVE access traps for blocking system calls like we did for non
> > blocking system calls in commit 8c845e273104 ("arm64/sve: Leave SVE enabled
> > on syscall if we don't context switch"). Let's do this by counting the
> > number of times we have loaded FPSIMD only register state for SVE tasks
> > and only disabling traps after some number of times, otherwise leaving
> > traps disabled and flushing the non-shared register state like we would on
> > trap.

> It looks like some people complain about SVE being disabled, though I
> assume this is for kernels prior to 6.2 and commit 8c845e273104
> ("arm64/sve: Leave SVE enabled on syscall if we don't context switch"):

> https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1999551/comments/52

Yes, from a user perspective this is a case where they're running what
they see as a perfectly standard instruction and finding that it might
take thousands of cycles longer than you'd expect it to while we take a
trip through EL1. If this happens frequently then it becomes much
harder to justify using SVE, especially for things that run for a short
time or which don't have an overwhelming performance advantage over a
non-SVE implementation.

> I assume we may see the other camp complaining about the additional
> state saving on context switch.

Yes, in the case where things get preempted that's an issue for tasks
that are mostly FPSIMD only otherwise we'd be better off just never
disabling SVE after it's been enabled and we had to allocate the
storage.

> Anyway, I don't see why we should treat blocking syscalls differently
> from non-blocking ones (addressed by the commit above). I guess the
> difference is that one goes through a context switch but, from a user
> perspective, it's still a syscall. The SVE state is expected to be
> discarded and there may be a preference for avoiding the subsequent
> fault.

To me this is purely about the fault behaviour, the fact that this is
related to the state saving/loading is more of an implementation detail
than

> > I pulled 64 out of thin air for the number of flushes to do, there is
> > doubtless room for tuning here. Ideally we would be able to tell if the
> > task is actually using SVE but without using performance counters (which
> > would be substantial work) we can't currently tell. I picked the number
> > because so many of the tasks using SVE used it so frequently.

> So I wonder whether we should make the timeout disabling behaviour the
> same for both blocking and non-blocking syscalls. IOW, ignore the
> context switching aspect. Every X number of returns, disable SVE
> irrespective of whether it was context switched or not. Or, if the
> number of returns has a variation in time, just use a jiffy (or other
> time based figure), it would time out in the same way for all types of
> workloads.

I think of those approaches a time based one seems more appealing -
we're basically just using the number of times we needed to reload the
state as a proxy for "has been running for a while" here since it's
information that's readily to hand. It would penalize tasks that spend
a lot of time blocked.

I'd still be inclined to only actually check when loading the state
simply because otherwise you can't avoid the overhead by doing something
like pinning your sensitive task to a CPU so it never gets scheduled
away which seems like a reasonable thing to optimise for. That would
mean a task that is only ever context switched through preemption would
never drop SVE state but I think that if your task is using that much
CPU the cost of saving the extra state on context switch is kind of in
the noise.

Attachment: signature.asc
Description: PGP signature