Re: [PATCH v3 1/3] RAS: Introduce AMD Address Translation Library

From: Yazen Ghannam
Date: Wed Dec 13 2023 - 12:04:13 EST


On 12/13/2023 11:48 AM, Borislav Petkov wrote:
On Wed, Dec 13, 2023 at 10:35:55AM -0500, Yazen Ghannam wrote:
I agree in principle. But I don't think it hurts to include an additional
line to avoid the confusion when the module doesn't load.

It does hurt because this turns into constant family updating the moment
a new family appears. This is one of the major reasons why we do CPUID
bits.

Also, the SMCA feature is used here as a short-cut to match on systems with
a Data Fabric. We could use the Zen feature in the same way.

We could.

What is the main description of the environment an ATL library belongs
into: a SMCA system or a Zen-based system?


Systems with an AMD Data Fabric.

I wrote up this comment for the amd_atl_cpuids[].

/*
This library provides functionality for AMD-based systems with a Data Fabric. The set of systems with a Data Fabric is equivalent to the set of Zen-based systems and the set of systems with the Scalable MCA feature at this time. However, these are technically independent things.

It's possible to match on the PCI IDs of the Data Fabric devices, but this will be an ever-expanding list. Instead match on the SMCA and Zen features to cover all relevant systems.
*/

We could also introduce another software feature bit for Data Fabrics "DF". And this could be set when we discover them, like in the AMD_NB code.

Thoughts?

The library init has three checks for system support. Comments added here.

// Load on systems with Data Fabrics (ZEN || SMCA).
// Filters out legacy systems.
if (!x86_match_cpu(amd_atl_cpuids))
return -ENODEV;

// Make sure the kernel recognizes this system's Data Fabric.
// Filters out new hardware.
if (!amd_nb_num())
return -ENODEV;

...

// Make sure the library supports this Data Fabric revision.
// Filters out totally new logic that requires library updates.
if (get_df_system_info())
return -ENODEV;

Thanks,
Yazen