Re: [patch V4 24/41] x86/cpu: Provide cpu_init/parse_topology()

From: K Prateek Nayak
Date: Mon Aug 28 2023 - 10:29:06 EST


Hello Thomas,

On 8/28/2023 3:35 PM, Thomas Gleixner wrote:
> Prateek!
>
> On Mon, Aug 28 2023 at 11:37, K. Prateek Nayak wrote:
>> On 8/14/2023 2:24 PM, Thomas Gleixner wrote:
>>
>> Since these enums come from the description of level type of CPUID leaf
>> 0x1f, can we have a short description clarifying what each signify. This
>> will also help clarify the mappings for AMD's extended CPUID leaf
>> 0x80000026 (specifically for CCX and CCD level types). I had following
>> in my mind:
>
> Makes sense.
>
>> TOPO_MODULE_DOMAIN,
>> + /*
>> + * If exists, represents a group of tiles within
>> + * an instance of the next domain
>> + *
>> + * On Intel: This level contains a group of Tile
>> + * type as described by CPUID leaf 0x1f
>> + *
>> + * On AMD: This is the group of "Complex" type
>> + * instances as described by CPUID leaf
>> + * 0x8000_0026
>> + */
>> TOPO_TILE_DOMAIN,
>> + /*
>> + * If exists, represents a group of dies within an
>> + * instance of the next domain
>> + *
>> + * On Intel: This level contains group of Die
>> + * type as described by CPUID leaf 0x1f
>> + *
>> + * On AMD: This is the group of "CCD (Die)"
>> + * type instances as described by CPUID leaf
>> + * 0x8000_0026
>> + */
>> TOPO_DIE_DOMAIN,
>> + /*
>> + * If exists, represents a group of packages
>> + * within the root domain
>> + */
>> TOPO_PKG_DOMAIN,
>> + /* Topmost domain with a singular instance */
>> TOPO_ROOT_DOMAIN,
>> TOPO_MAX_DOMAIN,
>> };
>
> Now this begs the obvious question what the actual meaning of these
> domains is and what's their relevance for the kernel.
>
> It's probably undisputed what SMT/CORE mean and what their relevance is.
> The PKG/DIE domains are pretty clear too.

Yup! Those seem to be clear.

>
> Now we have:
>
> MODULE (Intel only)
>
> TILE Intel, AMD names it "Complex"

So here is my interpretation of 0x1f since I could not find much in the
SDM for these level types. The interpretations are based on some of the
past discussions in the community (I'll give the relevant links below).

Intel Jacobsville has a group of cores sharing the L2 cache which the
scheduler currently models as a cluster. Some information about the
same has been shared by Tim Chen in:

https://lore.kernel.org/lkml/737932c9-846a-0a6b-08b8-e2d2d95b67ce@xxxxxxxxxxxxxxx/

Logically, this is what a module should map to IMO but Intel folks can
clarify.

In AMD processors, Complex (CCX) refers to chiplet where all the CPUs
share the same Last Level Cache (L3). Tiles are slightly different on
Intel since they do not necessarily mark the LLC boundary.

As suggested by Arjan on the thread, perhaps "chiplet" could be used as
a neutral term that describes Tiles on Intel and CCX on AMD.

With this information:

>
> So here are the questions:
>
> - is TILE to "Complex" the proper mapping?

Since "Module", based on my description above, translates to a
group of cores sharing the L2 cache, and since CCD is
interpreted as a "Die", that leaves us with "Complex" mapping
to a "Tile". Chiplet could be a neutral term as suggested by
Arjan.

>
> - which information is conveyed by MODULE and TILE?

Module: Groups of cores sharing L2 cache (Purely my
interpretation)

Tile: Chiplet. On AMD it also marks the L3 boundary.
>
> - Are these really different between AMD and Intel or is this some
> naming convention issue which needs to be resolved?

They do have different characteristics since, on Sapphire
Rapids, the LLC is at a socket boundary despite having multiple
tiles. (Please correct me if I'm wrong, I'm going off of
llc_id shared in this report by Qiuxu Zhuo -
https://lore.kernel.org/all/20230809161219.83084-1-qiuxu.zhuo@xxxxxxxxx/)

>
> Thanks,
>
> tglx
>

Most of this is based on my interpretation. Please correct me if I've
misinterpreted anything, especially the Intel bits :)

--
Thanks and Regards,
Prateek