Re: [PATCH 1/2] perf/x86/rapl: Add support for Intel Meteor Lake

From: Zhang, Rui
Date: Sat Jan 07 2023 - 09:07:55 EST


On Fri, 2023-01-06 at 06:50 -0800, Dave Hansen wrote:
> On 1/6/23 06:38, Zhang, Rui wrote:
> > My original proposal is that, instead of maintaining model lists in
> > a
> > series of different drivers, can we use feature flags instead, and
> > maintain them in a central place instead of different drivers. say,
> > something like
> >
> > static const struct x86_cpu_id intel_pm_features[] __initconst = {
> > X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L, X86_FEATURE
> > _RAPL | X86_FEATURE_TCC_COOLING),
> > X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, X86_FEATURE
> > _RAPL | X86_FEATURE_UNCORE_FREQ),
> > ...
> > X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, X86_FEATURE
> > _RAPL | X86_FEATURE_TCC_COOLING),
> > X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, X86_FEATURE
> > _RAPL | X86_FEATURE_UNCORE_FREQ),
> > ...
> > {},
> > };
> > And then set the feature flags based on this, and make the drivers
> > test
> > the feature flags.
>
> That works if you have very few features. SKYLAKE_X looks to have on
> the order of 15 model-specific features, or at least references in
> the code.
>
> That means that the
>
> X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, ...
>
> list goes on for 15 features. It's even worse than that because
> you'd
> *like* to be able to scan up and down the list looking for, say, "all
> the CPUs that support RAPL". But, if you do that, you actually need
> a
> table -- a really wide table -- for *all* the features and a column
> for
> each.

That's true.

>
> What we have now isn't bad. The only real way to fix this is to have
> the features enumerated *properly*, aka. architecturally.
>
> I get it, Intel doesn't want to dedicate CPUID bits and architecture
> to
> one-offs.

> But, at the point that there are a dozen CPU models across
> three or four different CPU generations, it's time to revisit
> it. Could
> you help our colleagues revisit it, please?

For this RAPL case, I think the biggest problem is the RAPL
*incompatibilities* between model variants as Ingo pointed out.
So a CPUID bit can not solve all the problems.

But given that the biggest inconsistency is the energy unit used on
different generations, I can also check with our colleagues if there is
a software visible way to get the "fixed" energy units rather than
hardcoding it in the driver using a model list.

thanks,
rui