Re: [PATCH v4 5/7] x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize.

From: Reinette Chatre
Date: Mon Aug 28 2023 - 13:07:12 EST


Hi Tony,

On 8/25/2023 10:49 AM, Tony Luck wrote:
> On Fri, Aug 11, 2023 at 10:32:29AM -0700, Reinette Chatre wrote:
>> On 7/22/2023 12:07 PM, Tony Luck wrote:

...

>>> +static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
>>> + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
>>> + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
>>> + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
>>> + {}
>>> +};
>>> +
>>> +/*
>>> + * There isn't a simple enumeration bit to show whether SNC mode
>>> + * is enabled. Look at the ratio of number of NUMA nodes to the
>>> + * number of distinct L3 caches. Take care to skip memory-only nodes.
>>> + */
>>> +static __init int get_snc_config(void)
>>> +{
>>> + unsigned long *node_caches;
>>> + int mem_only_nodes = 0;
>>> + int cpu, node, ret;
>>> +
>>> + if (!x86_match_cpu(snc_cpu_ids))
>>> + return 1;
>>> +
>>> + node_caches = kcalloc(BITS_TO_LONGS(nr_node_ids), sizeof(*node_caches), GFP_KERNEL);
>>> + if (!node_caches)
>>> + return 1;
>>> +
>>> + cpus_read_lock();
>>> + for_each_node(node) {
>>> + cpu = cpumask_first(cpumask_of_node(node));
>>> + if (cpu < nr_cpu_ids)
>>> + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
>>> + else
>>> + mem_only_nodes++;
>>> + }
>>> + cpus_read_unlock();
>>
>> I am not familiar with the numa code at all so please correct me
>> where I am wrong. I do see that nr_node_ids is initialized with __init code
>> so it should be accurate at this point. It looks to me like this initialization
>> assumes that at least one CPU per node will be online at the time it is run.
>> It is not clear to me that this assumption would always be true.
>
> Resctrl initialization is kicked off as a late_initcall(). So all CPUs
> and devices are fully initialized before this code runs.
>
> Resctrl can't be moved to an "init" state before CPUs are brought online
> because it makes a call to cpuhp_setup_state() to get callbacks for
> online/offline CPU events ... that call can't be done early.

Apologies but this is not so obvious to me. From what I understand a
system need not be booted with all CPUs online. CPUs can be brought
online at any time.

>>> +
>>> + ret = (nr_node_ids - mem_only_nodes) / bitmap_weight(node_caches, nr_node_ids);
>>> + kfree(node_caches);
>>> +
>>> + if (ret > 1)
>>> + rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = MON_SCOPE_NODE;
>>> +
>>> + return ret;
>>> +}
>>> +


Reinette