Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser

From: Zhang, Rui
Date: Mon Aug 14 2023 - 11:29:35 EST


Hi, Thomas,

On Mon, 2023-08-14 at 14:26 +0200, Thomas Gleixner wrote:
> > On Sun, 2023-08-13 at 17:04 +0200, Thomas Gleixner wrote:
> >
> > With this, we set dom_offset[DIE] to 7 first when parsing TILE, and
> > then overwrite it to 8 when parsing UBER_TILE, and set
> > dom_offset[PACKAGE] to 9 when parsinig DIE.
> >
> > lossing TILE.eax.shifts is okay, because it is for UBER_TILE id.
>
> No. That's just wrong. TILE is defined and potentially used in the
> kernel.

Sure.

> How can you rightfully assume that UBER TILE is a valid
> substitution? You can't.

TILE.eax.shifts tells
1. the number of maximum addressable threads in TILE domain, which
should be saved in x86_topo_system.dom_size[TILE]
2. the highest bit in APIC ID for tile id, but we don't need this if
we use package/system scope unique tile id
3. the lowest bit in APIC ID for the upper level of tile
if the upper level is a known level, say, die, this info is saved in
dom_offset[die]
if the upper level is an unknown level, then we don't need this to
decode the topology information for the unknown level.

maybe I missed something, for now I don't see how things break here.

>
> > Currently, die topology information is mandatory in Linux, we
> > cannot
> > make it right without patching enum topo_types/enum
> > x86_topology_domains/topo_domain_map (which in fact tells the
> > relationship between DIE and FOO).
>
> You cannot just nilly willy assume at which domain level FOO sits.

exactly.

> Look
> at your example:
>
> > Say, we have new level FOO, and the CPUID is like this
> > level   type            eax.shifts
> > 0       SMT             1
> > 1       CORE            5
> > 2       FOO             8
>
> FOO can be anything between CORE and PKG, so you cannot tell what it
> means.

Exactly. Anything related with MODULE/TILE/DIE can break in this case.

Say this is a system with 1 package, 2 FOOs, 8 cores.

In current design (in this patch set), kernel has to tell how many
dies/tiles/modules this system has, and kernel cannot do this right.

But if using optional Die (and surely optional module/tile), kernel can
tell that this is a 1-package-0-die-0-tile-0-module-8-core system
before knowing what FOO means, we don't need to make up anything we
don't know.

>
> Simply heuristics _cannot_ be correct by definition. So why trying to
> come up with them just because?
>
> What's the problem you are trying to solve? Some real world issue or
> some academic though experiment which might never become a real
> problem?
>
Maybe I was misleading previously, IMO, I totally agree with your
points, and "using optional die/tile/module" is what I propose to
address these concerns.

thanks,
rui