RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser

From: Brown, Len
Date: Tue Aug 15 2023 - 15:31:21 EST


Hello Thomas,

It seems we need to take a momentary step back to step forward...

First, the Intel CPUID context...

Even though CPUID.B was created to be extensible, we found that adding a "die" level to it would break legacy software.
That is because some legacy software did silly things, such as hard-coding that package level is always adjacent to the core level...

Enter CPUID.1F -- an exact clone of CPUID.B, but with a new name. The new name guaranteed that the old broken software will not parse CPUID.1F, and gave Intel license to add levels to CPUID.1F at any time without confusing CPUID.1F parsing software. As 3-year-old kernels routinely run on the very latest hardware, this future-proof goal is paramount.

Multi-die/package systems shipped as the first test of CPUID.1F. Enumerating the multi-die/package was mostly about MSR scope....

In retrospect, we under-specified what it means to enumerate a CPUID.1F die, because it has been a constant battle to get the HW people to *not* enumerate hidden die that software does not see.

Indeed, we were equally guilty in not codifying an architectural definition of "module" and "tile", which were placed into the CPUID.1F definition mostly as place-holders with awareness of hardware structures that were already in common use. For example, there were already module-scoped counters that were hard-coded, and enumerating modules seems to be an to give architectural (re-usable) enumeration to model-specific code.

Second, failings of the Linux topology code...

I agree with you that "thread_siblings" and "core_cpus" are the different words for the same thing.
This will always be true because the hardware architecture guarantees that SMT siblings are the next level down from core.

But no such definition exists for "core_siblings". It is impossible to write correct software that reads "core_siblings" and takes any action on it. Those could be the CPUs inside a module, or inside a die, or inside some other level that today's software can't possibly know by name.

On the other hand, die_cpus is clear -- the CPUs within a die.
Package_cpus -- the CPUs within a package.
Core_cpus -- the cpus within a core....
Words matter.

Specific replies....

Re: globally unique core_id

I have 100% confidence that you can make the Linux kernel handle a sparce globally unique core_id name space.
My concern is unknown exposure to joe-random-user-space program that consumes the sysfs representation.

>> Secondly, with the obsolescence of CPUID.0b and its replacement with
>> CPUID.1F, the contract between The hardware and the software is that a
>> level can appear and can in between any existing levels. (the only
>> exception is that SMT is married to core).

> In theory, yes. But what's the practical relevance that there might be a new level between CORE and MODULE or MODULE and TILE etc...?

>> It is not possible For an old kernel to know the name or position of a
>> new level in the hierarchy, going forward.

>Again, where is the practical problem? These new levels are not going to be declared nilly willy and every other week, right?

It is irrelevant if a new level is of any practical use to Linux.

What is important is that Linux be able to parse and use the levels it finds useful, while gracefully ignoring any that it doesn't care about (or doesn't yet know about).

Yes, hardware folks can drop something into the ucode and the SDM w/o us knowing ahead of time (see DieGrp in the June 2023 SDM). Certainly they can do it in well under the 4-year's notice we'd need if we were to simply track the named levels in the SDM.

>> Today, this manifests with a (currently) latent bug that I caused.
>> For I implemented die_id In the style of package_id, and I shouldn't
>> have followed that example.

> You did NOT. You implemented die_id relative to the package, which does not make it unique in the same way as core_id is relative to the package and therefore not unique.

The point is that like package_id=0 on a single package system, I put a die_id=0 attribute in sysfs even when NO "die" level is enumerated in CPUID.1F.

That was a mistake.

>> Today, if CPUID.1F doesn't know anything about multiple DIE, Linux
>> conjurs up A die_id 0 in sysfs. It should not. The reason is that
>> when CPUID.1F enumerates A level that legacy code doesn't know about,
>> we can't possibly tell if it is above DIE, or below DIE. If it is
>> above DIE, then our default die_id 0 is becomes bogus.

>That's an implementation problem and the code I posted fixes this by making die_id unique and taking the documented domain levels into account.

Your code change does not fix the problem above.

>So if 0x1f does not enumerate dies, then each package has one die and the die ID is the same as the package ID. It's that simple.

Unfortunately, no.

Your code will be written and ship before level-X is defined.
A couple of years later, level-X is defined above die.
Your code runs on new hardware that defines no packages, level-X, and no die.
How many die-id's does this system have?

If you could see into the future, you'd answer that there are 2-die, because
There is one inside each level-X.

But since die isn't enumerated, and you don't know if a level-X is defined to be above or below die,
then you can't tell if level-X is something containing die, or something contained-by die...

The proper solution is to not expose a die_id attribute in sysfs if there is no die level enumerated in CPUID.1F.
When it is enumerated, we get it right. When it is not enumerated, we don't guess.

> What do you win by removing them from the SDM?

When you give HW people enough rope to hang themselves, they will.
Give them something vague in the SDM, and you've created a monster that is interpreted differently by different hardware teams and no validation team on the planet can figure out if the hardware is correct or not.
Then the definition becomes how the OS (possibly not Linux) happened to use that interface on some past chip -- and that use is not documented in the SDM -- and down the rabbit hole you go...

When the SDM precisely documents the software/hardware interface, then proper tests can be written, independent hardware teams are forced to follow the same definition, and correct software can be written once and never break.

-Len