RE: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores

From: Haiyang Zhang
Date: Wed Nov 15 2023 - 17:54:53 EST




> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> Sent: Wednesday, November 15, 2023 8:49 AM
> To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>;
> sharmaajay@xxxxxxxxxxxxx; leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx;
> ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx;
> linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
> Cc: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; Paul Rosswurm
> <paulros@xxxxxxxxxxxxx>; Souradeep Chakrabarti
> <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> Subject: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores
>
> Existing MANA design assigns IRQ affinity to every sibling CPUs, which causes
> IRQ coalescing and may reduce the network performance with RSS.
>
> Improve the performance by adhering the configuration for RSS, which
> prioritise
> IRQ affinity on HT cores.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> ---
> .../net/ethernet/microsoft/mana/gdma_main.c | 126 ++++++++++++++++-
> -
> 1 file changed, 117 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 6367de0c2c2e..839be819d46e 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -1243,13 +1243,115 @@ void mana_gd_free_res_map(struct
> gdma_resource *r)
> r->size = 0;
> }
>
> +static void cpu_mask_set(cpumask_var_t *filter_mask, cpumask_var_t
> **filter_mask_list)
> +{
> + unsigned int core_count = 0, cpu;
> + cpumask_var_t *filter_mask_list_tmp;
> +
> + BUG_ON(!filter_mask || !filter_mask_list);
> + filter_mask_list_tmp = *filter_mask_list;
> + cpumask_copy(*filter_mask, cpu_online_mask);
> + /* for each core create a cpumask lookup table,
> + * which stores all the corresponding siblings
> + */
> + for_each_cpu(cpu, *filter_mask) {
> +
> BUG_ON(!alloc_cpumask_var(&filter_mask_list_tmp[core_count],
> GFP_KERNEL));
> + cpumask_or(filter_mask_list_tmp[core_count],
> filter_mask_list_tmp[core_count],
> + topology_sibling_cpumask(cpu));
> + cpumask_andnot(*filter_mask, *filter_mask,
> topology_sibling_cpumask(cpu));
> + core_count++;
> + }
> +}
> +
> +static int irq_setup(int *irqs, int nvec)
> +{
> + cpumask_var_t filter_mask;
> + cpumask_var_t *filter_mask_list;
> + unsigned int cpu_first, cpu, irq_start, cores = 0;
> + int i, core_count = 0, numa_node, cpu_count = 0, err = 0;
> +
> + BUG_ON(!alloc_cpumask_var(&filter_mask, GFP_KERNEL));
> + cpus_read_lock();
> + cpumask_copy(filter_mask, cpu_online_mask);
> + /* count the number of cores
> + */
> + for_each_cpu(cpu, filter_mask) {
> + cpumask_andnot(filter_mask, filter_mask,
> topology_sibling_cpumask(cpu));
> + cores++;
> + }
> + filter_mask_list = kcalloc(cores, sizeof(cpumask_var_t), GFP_KERNEL);
> + if (!filter_mask_list) {
> + err = -ENOMEM;
> + goto free_irq;
> + }
> + /* if number of cpus are equal to max_queues per port, then
> + * one extra interrupt for the hardware channel communication.
> + */
> + if (nvec - 1 == num_online_cpus()) {
> + irq_start = 1;
> + cpu_first = cpumask_first(cpu_online_mask);
> + irq_set_affinity_and_hint(irqs[0], cpumask_of(cpu_first));
> + } else {
> + irq_start = 0;
> + }
> + /* reset the core_count and num_node to 0.
> + */
> + core_count = 0;
> + numa_node = 0;

Please start with gc->numa_node here. I know it's 0 for now. But the host
will provide real numa node# close to the device in the future.

Also, as we discussed, consider using the NUMA distance to select the next
numa node (in a separate patch).

> + cpu_mask_set(&filter_mask, &filter_mask_list);
> + /* for each interrupt find the cpu of a particular
> + * sibling set and if it belongs to the specific numa
> + * then assign irq to it and clear the cpu bit from
> + * the corresponding sibling list from filter_mask_list.
> + * Increase the cpu_count for that node.
> + * Once all cpus for a numa node is assigned, then
> + * move to different numa node and continue the same.
> + */
> + for (i = irq_start; i < nvec; ) {
> + cpu_first = cpumask_first(filter_mask_list[core_count]);
> + if (cpu_first < nr_cpu_ids && cpu_to_node(cpu_first) ==
> numa_node) {
> + irq_set_affinity_and_hint(irqs[i],
> cpumask_of(cpu_first));
> + cpumask_clear_cpu(cpu_first,
> filter_mask_list[core_count]);
> + cpu_count = cpu_count + 1;
> + i = i + 1;
> + /* checking if all the cpus are used from the
> + * particular node.
> + */
> + if (cpu_count == nr_cpus_node(numa_node)) {
> + numa_node = numa_node + 1;
> + if (numa_node == num_online_nodes()) {
> + cpu_mask_set(&filter_mask,
> &filter_mask_list);
> + numa_node = 0;
Ditto.

Other things look good to me.

Thanks,
- Haiyang