Re: RISCV Vector unit disabled by default for new task (was Re: [PATCH v12 17/17] riscv: prctl to enable vector commands)

From: Andrew Pinski
Date: Thu Dec 15 2022 - 14:00:08 EST


On Thu, Dec 15, 2022 at 10:57 AM Vineet Gupta <vineetg@xxxxxxxxxxxx> wrote:
>
>
>
> On 12/15/22 07:33, Richard Henderson wrote:
> > On 12/15/22 04:28, Florian Weimer via Libc-alpha wrote:
> >> * Björn Töpel:
> >>
> >>>> For SVE, it is in fact disabled by default in the kernel. When a
> >>>> thread
> >>>> executes the first SVE instruction, it will cause an exception, the
> >>>> kernel
> >>>> will allocate memory for SVE state and enable TIF_SVE. Further use
> >>>> of SVE
> >>>> instructions will proceed without exceptions. Although SVE is
> >>>> disabled by
> >>>> default, it is enabled automatically. Since this is done
> >>>> automatically
> >>>> during an exception handler, there is no opportunity for memory
> >>>> allocation
> >>>> errors to be reported, as there are in the AMX case.
> >>>
> >>> Glibc has an SVE optimized memcpy, right? Doesn't that mean that pretty
> >>> much all processes on an SVE capable system will enable SVE
> >>> (lazily)? If
> >>> so, that's close to "enabled by default" (unless SVE is disabled system
> >>> wide).
> >>
> >> Yes, see sysdeps/aarch64/multiarch/memcpy.c:
> >>
> >> static inline __typeof (__redirect_memcpy) *
> >> select_memcpy_ifunc (void)
> >> {
> >> INIT_ARCH ();
> >> if (sve && HAVE_AARCH64_SVE_ASM)
> >> {
> >> if (IS_A64FX (midr))
> >> return __memcpy_a64fx;
> >> return __memcpy_sve;
> >> }
> >> if (IS_THUNDERX (midr))
> >> return __memcpy_thunderx;
> >> if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr))
> >> return __memcpy_thunderx2;
> >> if (IS_FALKOR (midr) || IS_PHECDA (midr))
> >> return __memcpy_falkor;
> >> return __memcpy_generic;
> >> }
> >> And the __memcpy_sve implementation actually uses SVE.
> >>
> >> If there were a prctl to select the vector width and enable the vector
> >> extension, we'd have to pick a width in glibc anyway.
> >
> > There *is* a prctl to adjust the SVE vector width, but glibc does not
> > need to select because SVE dynamically adjusts to the currently
> > enabled width. The kernel selects a default width that fits within
> > the default signal frame size.
> >
> > The other thing of note for SVE is that, with the default function ABI
> > all of the SVE state is call-clobbered, which allows the kernel to
> > drop instead of save state across system calls. (There is a separate
> > vector function call ABI when SVE types are used.)
>
> For the RV psABI, it is similar - all V regs are
> caller-saved/call-clobbered [1] and syscalls are not required to
> preserve V regs [2]
> However last I checked ARM documentation the ABI doc seemed to suggest
> that some (parts) of the SVE regs are callee-saved [3]

Yes the lower 64 bits which overlap with the floating point registers.

Thanks,
Andrew Pinski


>
> >
> > So while strcpy may enable SVE for the thread, the next syscall may
> > disable it again.
>
> Next syscall could trash them, but will it disable SVE ? Despite
> syscall/function-call clobbers, using V in tight loops such as mem*/str*
> still is a win.
>
>
> [1]
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc
> [2]
> https://github.com/riscv/riscv-v-spec/blob/master/calling-convention.adoc
> [3]
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs64/aapcs64.rst#the-base-procedure-call-standard
> Sec 6.1.3 ".... In other cases it need only preserve the low 64 bits of
> z8-z15"
>