Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

From: Arjan van de Ven
Date: Mon May 17 2021 - 09:49:15 EST


Having a proper interface (syscall, prctl) which user space can use to
ask for permission and allocation of the necessary buffer(s) is clearly
avoiding the downsides and provides the necessary mechanisms for proper
control and failure handling.

this would need to be a "get / put" interface, so a refcount; that way things nest nicely.
For API symmetry I'd want to have the put there, even if we may decide to be infinitely lazy
in cleaning up the state.

it also would want it to take an arguement that's a bitmask, so that this can be applied
to future state as well.

Eh actually I'd start with also adding AVX512 to this. Even though for obvious compat reasons
that one is on by default (so at process start we might need to start with a count of 1)
it's interesting to fold that into this same framework.
(and who knows, dropping AVX512 state if you don't need it might improve context switches)

Syscalls are relatively cheap (and I can imagine the C library doing a TLS cache of the count
if it becomes an issue) so can be done on a relatively finegrained level.

I've worked on OpenBLAS before, and that library basically has a global initialization function
that ends up getting called on the first big math op (it may spawn threads as well etc) but which
"stays around" for consecutive math functions; a get/put model would work quite well for such math
library (since it's based on BLAS like almost all such math libraries, I expect this to be the common
pattern)