ARM SVE ABI: kernel dropping SVE/SME state on syscalls

From: Vineet Gupta
Date: Wed Mar 27 2024 - 20:30:08 EST


Hi Will, Marc,

In the RISC-V land we are hitting an issue and need some help
understanding the SVE ABI about dropping the state on syscalls (and its
implications etc - in hindsight)

If I'm reading the arm64 code correctly, SVE state is unconditionally
(for any syscall whatsoever) dropped in following code path:

el0_svc
    fp_user_discard

The RISC-V Vector ABI mandates something similar and kernel implements
something similar.

    2023-06-29 9657e9b7d253 riscv: Discard vector state on syscalls  

However in recent testing with RISC-V vector builds we are running into
an issue when this just doesn't work.

Just for some background, RISC-V vector instructions relies on
additional state in a VTYPE register which is setup using an apriori
VSETVLI insn.
So consider the following piece of code:

   3ff80:    cc787057              vsetivli    zero,16,e8,mf2,ta,ma    
<-- sets up VTYPE
   3ff84:    44d8                    lw    a4,12(s1)
   3ff86:    449c                    lw    a5,8(s1)
   3ff88:    06f75563              bge    a4,a5,3fff2
   3ff8c:    02010087              vle8.v    v1,(sp)
   3ff90:    020980a7              vse8.v    v1,(s3)   <-- Vector store
instruction
Here's the sequence of events that's causing the issue 1. The vector
store instruction (in say bash) takes a page fault, enters kernel.
2. In PF return path, a SIGCHLD signal is pending (a bash sub-shell
which exited, likely on different cpu).
3. kernel resumes in userspace signal handler which ends up making an
rt_sigreturn syscall - and which as specified discards the V state (and
makes VTYPE reg invalid).
4. When sigreturn finally returns to original Vector store instruction,
invalid VTYPE triggers an Illegal instruction which causes a SIGILL (as
state was discarded above).

So there is no way dropping syscall state would work here.

How do you guys handle this for SVE/SME ? One way would be to not do the
discard in rt_sigreturn codepath, but I don't see that - granted I'm not
too familiar with arch/arm64/*/**

Other thing I wanted to ask is, have there been any perf implications of
this ABI decision: as in if this was other way around, userspace (and/or
compilers) could potentially leverage the fact that SVE/SME state would
still be valid past a syscall - and won't have to reload/resetup etc.

Thanks,
-Vineet