Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser

From: Thomas Gleixner
Date: Sat Aug 12 2023 - 16:06:55 EST


On Sat, Aug 12 2023 at 08:21, Rui Zhang wrote:
> On Wed, 2023-08-02 at 12:21 +0200, Thomas Gleixner wrote:
>> +
>> +/*
>> + * Use a lookup table for the case that there are future types > 6
>> which
>> + * describe an intermediate domain level which does not exist today.
>> + *
>> + * A table will also be handy to parse the new AMD 0x80000026 leaf
>> which
>> + * has defined different domain types, but otherwise uses the same
>> layout
>> + * with some of the reserved bits used for new information.
>> + */
>> +static const unsigned int topo_domain_map[MAX_TYPE] = {
>> +       [SMT_TYPE]      = TOPO_SMT_DOMAIN,
>> +       [CORE_TYPE]     = TOPO_CORE_DOMAIN,
>> +       [MODULE_TYPE]   = TOPO_MODULE_DOMAIN,
>> +       [TILE_TYPE]     = TOPO_TILE_DOMAIN,
>> +       [DIE_TYPE]      = TOPO_DIE_DOMAIN,
>> +       [DIEGRP_TYPE]   = TOPO_PKG_DOMAIN,
>
> May I know why DIEGRP_TYPE is mapped to TOPO_PKG_DOMAIN?

Where else should it go? It's the topmost, no? But diegrp is a
terminoligy which is not used in the kernel

>> +
>> +       if (sl.type >= maxtype) {
>
> It is still legal to have sparse type value in the future, and then
> this check will break.
> IMO, it is better to use a function to convert type to domain, and
> check for unknown domain here, say, something like

Why? If somewhere in the future Intel decides to add UBER_TILE_TYPE,
then this will be a type larger than DIEGRP_TYPE. maxtype will then
cover the whole thing and the table will map it to the right place.

Even if in their infinite wisdom the HW folks decide to make a gap, then
the table can handle it simply by putting an invalid value into the gap
and checking for that.

Serioulsy we don't need a switch case for that.
>> +               /*
>> +                * As the subleafs are ordered in domain level order,
>> this
>> +                * could be recovered in theory by propagating the
>> +                * information at the last parsed level.
>> +                *
>> +                * But if the infinite wisdom of hardware folks
>> decides to
>> +                * create a new domain type between CORE and MODULE
>> or DIE
>> +                * and DIEGRP, then that would overwrite the CORE or
>> DIE
>> +                * information.
>
> Sorry that I'm confused here.
>
> Say, we have CORE, FOO, MODULE, then the subleave of FOO must be higher
> than CORE but lower than MODULE.
> so we parse CORE first and propagates the info to FOO/MODULE, and then
> parse FOO and propagate to MODULE, and parse MODULE in the end.
> How could we overwrite the info of a lower level?

We don't know about this new thing yet. So where should we propagate to?
We could say, last was core so we stick the new thing into module, but
do we know that's correct? Do we know there is a module actually. Let me
rephrase that comment.
>> +                *
>> +                * Refuse to implement monstrosity, emit an error and try
>> +                * to survive.
>> +                */
>> +               pr_err_once("Topology: leaf 0x%x:%d Unknown domain type %u\n",
>> +                           leaf, subleaf, sl.type);
>> +               return true;
>
> Don't want to be TLDR, I can think of a couple cases that breaks Linux
> in different ways if we ignore the cpu topology info of an unknown
> level.

Come on. If Intel manages it to create a new level then it's not rocket
science to integrate support for that a long time before actual silicon
ships. So what will break? The machines of people who use ancient
kernels on modern hardware? They can keep the pieces.

> So I just want to understand the strategy here, does this mean that
> we're not looking for a future proof solution, and instead we are
> planning to take future updates (patch enum topo_types/enum
> x86_topology_domains/topo_domain_map) whenever a new level is
> invented?

You need that anyway, no?

> With this, we can guarantee that all the available topology information
> are always valid, even when running on future platforms.

I know that it can be made work, but is it worth the extra effort? I
don't think so.

Thanks,

tglx