Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs spread API

From: Valentin Schneider
Date: Tue Aug 09 2022 - 06:02:21 EST


On 08/08/22 17:39, Tariq Toukan wrote:
> On 8/4/2022 8:28 PM, Valentin Schneider wrote:
>> On 28/07/22 22:12, Tariq Toukan wrote:
>>> +static bool sched_cpus_spread_by_distance(int node, u16 *cpus, int ncpus)
>> ^^^^^^^^^
>> That should be a struct *cpumask.
>
> With cpumask, we'll lose the order.
>

Right, I didn't get that from the changelog.

>>
>>> +{
>>> + cpumask_var_t cpumask;
>>> + int first, i;
>>> +
>>> + if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>>> + return false;
>>> +
>>> + cpumask_copy(cpumask, cpu_online_mask);
>>> +
>>> + first = cpumask_first(cpumask_of_node(node));
>>> +
>>> + for (i = 0; i < ncpus; i++) {
>>> + int cpu;
>>> +
>>> + cpu = sched_numa_find_closest(cpumask, first);
>>> + if (cpu >= nr_cpu_ids) {
>>> + free_cpumask_var(cpumask);
>>> + return false;
>>> + }
>>> + cpus[i] = cpu;
>>> + __cpumask_clear_cpu(cpu, cpumask);
>>> + }
>>> +
>>> + free_cpumask_var(cpumask);
>>> + return true;
>>> +}
>>
>> This will fail if ncpus > nr_cpu_ids, which shouldn't be a problem. It
>> would make more sense to set *up to* ncpus, the calling code can then
>> decide if getting fewer than requested is OK or not.
>>
>> I also don't get the fallback to cpumask_local_spread(), is that if the
>> NUMA topology hasn't been initialized yet? It feels like most users of this
>> would invoke it late enough (i.e. anything after early initcalls) to have
>> the backing data available.
>
> I don't expect this to fail, as we invoke it late enough. Fallback is
> there just in case, to preserve the old behavior instead of getting
> totally broken.
>

Then there shouldn't be a fallback method - the main method is expected to
work.

>>
>> Finally, I think iterating only once per NUMA level would make more sense.
>
> Agree, although it's just a setup stage.
> I'll check if it can work for me, based on the reference code below.
>
>>
>> I've scribbled something together from those thoughts, see below. This has
>> just the mlx5 bits touched to show what I mean, but that's just compile
>> tested.
>
> My function returns a *sorted* list of the N closest cpus.
> That is important. In many cases, drivers do not need all N irqs, but
> only a portion of it, so it wants to use the closest subset of cpus.
>

Are there cases where we can't figure this out in advance? From what I grok
out of the two callsites you patched, all vectors will be used unless some
error happens, so compressing the CPUs in a single cpumask seemed
sufficient.