RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser

From: Brown, Len
Date: Mon Aug 14 2023 - 10:49:35 EST


> What's the problem you are trying to solve? Some real world issue or some academic though experiment which might never become a real problem?

There are several existing bugs, bad practices, and latent future bugs in today's x86 topology code.

First, with multiple cores having the same core_id.
A consumer of core_id must know about packages to understand core_id.

This is the original sin of the current interface -- which should never have used the word "sibling" *anyplace*,
Because to make sense of the word sibling, you must know what *contains* the siblings.

We introduced "core_cpus" a number of years ago to address this for core_ids (and for other levels,
Such as die_cpus). Unfortunately, we can probably never actually delete threads_siblings and core_siblings
Without breaking some program someplace...

Core_id should be system-wide global, just like the cpu number is system wide global.
Today, it is the only level id that is not system wide global.
This could be implemented by simply not masking off the package_id bits when creating the core_id,
Like we have done for other levels.
Yes, this could be awkward for some existing code that indexes arrays with core_id, and doesn't like them to be sparse.
But that rough edge is a much smaller problem than having to comprehend a level (package) that you may
Otherwise not care about. Besides, core_id's can already be sparse today.

Secondly, with the obsolescence of CPUID.0b and its replacement with CPUID.1F, the contract between
The hardware and the software is that a level can appear and can in between any existing levels.
(the only exception is that SMT is married to core). It is not possible
For an old kernel to know the name or position of a new level in the hierarchy, going forward.

Today, this manifests with a (currently) latent bug that I caused. For I implemented die_id
In the style of package_id, and I shouldn't have followed that example.
Today, if CPUID.1F doesn't know anything about multiple DIE, Linux conjurs up
A die_id 0 in sysfs. It should not. The reason is that when CPUID.1F enumerates
A level that legacy code doesn't know about, we can't possibly tell if it is above DIE,
or below DIE. If it is above DIE, then our default die_id 0 is becomes bogus.

That said, I have voiced my objection inside Intel to the creation of random levels
Which do not have an architectural (software) definition; and I'm advocating that
They be *removed* from the SDM until a software programming definition that
Spans all generation is documented.

SMT, core, module, die and the (implicit) package may not be well documented,
But they do have existing uses and will thus live on.
The others maybe not.

-Len