Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

From: John Baldwin
Date: Thu Mar 14 2024 - 12:46:14 EST


On 3/14/24 8:37 AM, Dave Hansen wrote:
On 3/14/24 04:23, Vignesh Balasubramanian wrote:
Add a new .note section containing type, size, offset and flags of
every xfeature that is present.

Mechanically, I'd much rather have all of that info in the cover letter
in the actual changelog instead.

I'd also love to see a practical example of what an actual example core
dump looks like on two conflicting systems:

* Total XSAVE size
* XCR0 value
* XSTATE_BV from the core dump
* XFEATURE offsets for each feature

I noticed this when I bought an AMD Ryzen 9 5900X based system for my desktop
running FreeBSD and found that the XSAVE core dump notes were not recognized
by GDB (FreeBSD dumps an XSAVE register set note that matches the same
layout of NT_X86_XSTATE used by Linux).

In particular, as the cover letter notes, on this AMD processor, there is
no "gap" for MPX, so the PKRU registers it provides are at a different offset
than on Intel CPUs. Furthermore, my reading of the SDM is that there is no
guarantee of architectural offsets of a given XSAVE feature and that software
should be querying CPUID to determine the layout.

FWIW, the relevant CPUID leaves for my AMD system:

XSAVE features (0xd/0):
XCR0 valid bit field mask = 0x0000000000000207
x87 state = true
SSE state = true
AVX state = true
MPX BNDREGS = false
MPX BNDCSR = false
AVX-512 opmask = false
AVX-512 ZMM_Hi256 = false
AVX-512 Hi16_ZMM = false
PKRU state = true
XTILECFG state = false
XTILEDATA state = false
bytes required by fields in XCR0 = 0x00000988 (2440)
bytes required by XSAVE/XRSTOR area = 0x00000988 (2440)
XSAVEOPT instruction = true
XSAVEC instruction = true
XGETBV instruction = true
XSAVES/XRSTORS instructions = true
XFD: extended feature disable supported = false
SAVE area size in bytes = 0x00000348 (840)
IA32_XSS valid bit field mask = 0x0000000000001800
PT state = false
PASID state = false
CET_U user state = true
CET_S supervisor state = true
HDC state = false
UINTR state = false
LBR state = false
HWP state = false

Do you have any information about what other OSes are doing in this
area? I thought Windows, for instance, was even less flexible about the
XSAVE format than Linux is.

I have an implementation of a similar note for FreeBSD already as well as
patches for GDB to make use of the note (for FreeBSD) and generate it
via 'gcore' (on FreeBSD). However, I would very much like to reach
consensus on a shared format of the note to avoid gratuitous differences
between FreeBSD and Linux. The AMD folks were gracious enough to work on
the Linux kernel implementation. A bit more on that below though.

Why didn't LWP cause this problem?

From the cover letter:

But this patch series depends on heuristics based on the total XSAVE
register set size and the XCR0 mask to infer the layouts of the
various register blocks for core dumps, and hence, is not a foolproof
mechanism to determine the layout of the XSAVE area.

It may not be theoretically foolproof. But I'm struggling to think of a
case where it would matter in practice. Is there any CPU from any
vendor where this is actually _needed_?

Sure, it's ugly as hell, but these notes aren't going to be available
universally _ever_, so it's not like the crummy heuristic code gets to
go away.

Have you seen the APX spec?


https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

It makes this even more fun because it adds a new XSAVE state component,
but reuses the MPX offsets.

This information will be used by the debuggers to understand the XSAVE
layout of the machine where the core file is dumped, and to read XSAVE
registers, especially during cross-platform debugging.

This is pretty close to just a raw dump of the XSAVE CPUID leaves.
Rather than come up with an XSAVE-specific ABI that depends on CPUID
*ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
should just bite the bullet and dump out (some of) the raw CPUID space.

So the current note I initially proposed and implemented for FreeBSD
(https://reviews.freebsd.org/D42136) and an initial patch set for GDB
(https://sourceware.org/pipermail/gdb-patches/2023-October/203083.html)
do indeed dump a raw set of CPUID leaves. The version I have for FreeBSD
only dumps the raw leaf values for leaf 0x0d though the note format is
extensible should additional leaves be needed in the future. One of the
questions if we wanted to use a CPUID leaf note is which leaves to dump
(e.g. do you dump all of them, or do you just dump the subset that is
currently needed). Another quirky question is what to do about systems
with hetergeneous cores (E vs P for example). Currently those systems
use the same XSAVE layout across all cores, but other CPUID leaves do
already vary across cores on those systems. Some options considered for
that are to 1) use a separate note type for "other" core types (e.g.
a separate note type for "E" cores), or 2) make this new note a per-thread
note that matches the core the given thread was running on when the
register state stored in the process core dump was saved.

However, there are other wrinkles with the leaf approach. Namely, one
of the use cases that I currently have an ugly hack for in GDB is if
you are using gdb against a remote host running gdbserver and then use
'gcore' to generate a core dump. GDB needs to write out a NT_X86_XSTATE
note, but that note requires a layout. What GDB does today is just pick
a known Intel layout based on the XCR0 mask. However, GDB should ideally
start writing out whatever new note we adopt here, so if we dump raw
CPUID leaves it means extending the GDB remote protocol so we can query
the CPUID leaves from the remote host. On the other hand, if we choose a
more abstract format as proposed in this patch, the local GDB (or LLDB
or whatever) can generate whatever synthetic layout it wants to write
the local NT_X86_XSTATE. (NB: A relevant detail here is that the GDB
remote protocol does not pass the entire XSAVE state across as a block,
instead gdbserver parses individual register values for AVX, etc.
registers and those decoded register values are passed over the
protocol.)

Another question is potentially supporting compact XSAVE format in
for NT_X86_XSTATE. Today Linux has some complicated code to re-expand
the compat XSAVE format back out to the standard layout for ptrace() and
process core dumps. (FreeBSD doesn't yet make use of XSAVEC so we
haven't yet dealt with that problem.) The CPUID leaf approach would
allow us to support compact formats, though GDB would have to check for
the flag in the XSAVE header to decide which format to use, etc. On
the other hand, if we use the more abstract format in this patch, then
GDB wouldn't actually have to care at all. The kernel would just dump
out the "compact" form of the layout note and the direct XSAVEC output
as the note. (I will probably do this in FreeBSD eventually, but
using a policy knob (sysctl on FreeBSD) to control if it is enabled
that FreeBSD would default to on at some point in the future.)

I don't really have a strong preference for which type of note to dump
myself, I really just want to have a shared format so that there is
less work to do on the tools side (e.g. GDB).

Also, FWIW, I did try to raise this topic on LLDB's discussion forums
and got a simple "sounds ok" type response but no detailed feedback.
That was a proposal for the CPUID leaf note, but I suspect LLDB will
be fine with either approach. Certainly I will update GDB to work
with whatever approach is adopted.

--
John Baldwin