Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

From: Dave Hansen
Date: Thu Mar 14 2024 - 13:10:32 EST


On 3/14/24 09:45, John Baldwin wrote:
> On 3/14/24 8:37 AM, Dave Hansen wrote:
>> On 3/14/24 04:23, Vignesh Balasubramanian wrote:
>>> Add a new .note section containing type, size, offset and flags of
>>> every xfeature that is present.
>>
>> Mechanically, I'd much rather have all of that info in the cover letter
>> in the actual changelog instead.
>>
>> I'd also love to see a practical example of what an actual example core
>> dump looks like on two conflicting systems:
>>
>>     * Total XSAVE size
>>     * XCR0 value
>>     * XSTATE_BV from the core dump
>>     * XFEATURE offsets for each feature
>
> I noticed this when I bought an AMD Ryzen 9 5900X based system for
> my desktop running FreeBSD and found that the XSAVE core dump notes
> were not recognized by GDB (FreeBSD dumps an XSAVE register set note
> that matches the same layout of NT_X86_XSTATE used by Linux).

I just want to make sure that you heard what I asked. I'd like to see a
practical example of how the real-world enumeration changes between two
real world systems.

Is that possible?

Here's the raw CPUID data from the XSAVE region on my laptop:

> 0x0000000d 0x00: eax=0x000002e7 ebx=0x00000a88 ecx=0x00000a88 edx=0x00000000
> 0x0000000d 0x01: eax=0x0000000f ebx=0x00000998 ecx=0x00003900 edx=0x00000000
> 0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
> 0x0000000d 0x05: eax=0x00000040 ebx=0x00000440 ecx=0x00000000 edx=0x00000000
> 0x0000000d 0x06: eax=0x00000200 ebx=0x00000480 ecx=0x00000000 edx=0x00000000
> 0x0000000d 0x07: eax=0x00000400 ebx=0x00000680 ecx=0x00000000 edx=0x00000000
> 0x0000000d 0x08: eax=0x00000080 ebx=0x00000000 ecx=0x00000001 edx=0x00000000
> 0x0000000d 0x09: eax=0x00000008 ebx=0x00000a80 ecx=0x00000000 edx=0x00000000
> 0x0000000d 0x0b: eax=0x00000010 ebx=0x00000000 ecx=0x00000001 edx=0x00000000
> 0x0000000d 0x0c: eax=0x00000018 ebx=0x00000000 ecx=0x00000001 edx=0x00000000
> 0x0000000d 0x0d: eax=0x00000008 ebx=0x00000000 ecx=0x00000001 edx=0x00000000

Could we get that for an impacted AMD system, please?

cpuid -1 --raw | grep " 0x0000000d "

should do it.

> In particular, as the cover letter notes, on this AMD processor,
> there is no "gap" for MPX, so the PKRU registers it provides are at a
> different offset than on Intel CPUs. Furthermore, my reading of the
> SDM is that there is no guarantee of architectural offsets of a given
> XSAVE feature and that software should be querying CPUID to determine
> the layout.

I'd say the SDM is an aspirational document. Intel _aspires_ to be able
to change the layouts whenever it wants. That doesn't mean that it
could actually pull it off in practice.

In practice, the offset are fixed and Intel can't change them.

> FWIW, the relevant CPUID leaves for my AMD system:
>
>    XSAVE features (0xd/0):
>       XCR0 valid bit field mask               = 0x0000000000000207>          x87 state                            = true
..

So, those actually aren't the relevant ones. We need EAX (size) and EBX
(user offset) from all of the sub-leaves.

>> This is pretty close to just a raw dump of the XSAVE CPUID leaves.
>> Rather than come up with an XSAVE-specific ABI that depends on CPUID
>> *ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
>> should just bite the bullet and dump out (some of) the raw CPUID space.
>
> So the current note I initially proposed and implemented for FreeBSD
> (https://reviews.freebsd.org/D42136) and an initial patch set for GDB
> (https://sourceware.org/pipermail/gdb-patches/2023-October/203083.html)
> do indeed dump a raw set of CPUID leaves.  The version I have for FreeBSD
> only dumps the raw leaf values for leaf 0x0d though the note format is
> extensible should additional leaves be needed in the future.  One of the
> questions if we wanted to use a CPUID leaf note is which leaves to dump
> (e.g. do you dump all of them, or do you just dump the subset that is
> currently needed).

You dump what is needed and add to the dump over time.

> Another quirky question is what to do about systems with hetergeneous
> cores (E vs P for example).
That's irrelevant for now. The cores may be heterogeneous but the
userspace ISA and (and thus XSAVE formats) are identical. If they're
not, then we have bigger problems on our hands.

> Currently those systems use the same XSAVE layout across all cores,
> but other CPUID leaves do already vary across cores on those systems.

There shouldn't be any CPUID leaves that differ _and_ matter to
userspace and thus core dumps.

> However, there are other wrinkles with the leaf approach.  Namely, one
> of the use cases that I currently have an ugly hack for in GDB is if
> you are using gdb against a remote host running gdbserver and then use
> 'gcore' to generate a core dump.  GDB needs to write out a NT_X86_XSTATE
> note, but that note requires a layout.  What GDB does today is just pick
> a known Intel layout based on the XCR0 mask.  However, GDB should ideally
> start writing out whatever new note we adopt here, so if we dump raw
> CPUID leaves it means extending the GDB remote protocol so we can query
> the CPUID leaves from the remote host.  On the other hand, if we choose a
> more abstract format as proposed in this patch, the local GDB (or LLDB
> or whatever) can generate whatever synthetic layout it wants to write
> the local NT_X86_XSTATE.  (NB: A relevant detail here is that the GDB
> remote protocol does not pass the entire XSAVE state across as a block,
> instead gdbserver parses individual register values for AVX, etc.
> registers and those decoded register values are passed over the
> protocol.)

So the gdb side says, "Give me PKRU" and the remote side parses the
XSAVE image, finds PKRU, and sends it over the wire?

> Another question is potentially supporting compact XSAVE format in
> for NT_X86_XSTATE.  Today Linux has some complicated code to re-expand
> the compat XSAVE format back out to the standard layout for ptrace() and
> process core dumps.

Yeah, but supporting the compacted format in NT_X86_XSTATE doesn't help
us at all. We still intermingle user and supervisor state and that
needs to get repacked _anyway_.

In other words, no matter what we do, it's going to be complicated
because the userspace buffer can't have supervisor state and the kernel
buffer does have it. The compacted format mismatch is the least of our
problems.

>  (FreeBSD doesn't yet make use of XSAVEC so we
> haven't yet dealt with that problem.) 

.. or XSAVES, which is actually the most relevant here.

Backing up... there are two approaches here:

1. Dump out raw x86-specific gunk, aka. CPUID contents itself. There
are a billion ways to do this and lots of complications, including
the remote protocol implications
or
2. Define an abstract format that works anywhere, not just on x86 and
not just for XSAVE.

There's no (sane) middle ground. The implementation here (in this
patch) is fundamentally x86-specific and pretends to be some kind of
abstracted x86-independent format.