Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs

From: Nitesh Narayan Lal
Date: Mon Sep 21 2020 - 23:08:33 EST



On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>> Nitesh Narayan Lal wrote:
>>
>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>> based on the online CPUs could lead to a potential issue on an RT setup
>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>> because in these kinds of setups an attempt to move the IRQs to the
>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>> CPU vector limit. This could eventually result in latency spikes because
>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>
>>> This patch prevents i40e to add vectors only based on available
>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx>
>> The driver changes are straightforward, but this isn't the only driver
>> with this issue, right? I'm sure ixgbe and ice both have this problem
>> too, you should fix them as well, at a minimum, and probably other
>> vendors drivers:
>>
>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>> ...
>> 50 files contained matches
> Ouch, I was indeed surprised that these MSI vector allocations were done
> at the driver level and not at some $SUBSYSTEM level.
>
> The logic is already there in the driver so I wouldn't oppose to this very patch
> but would a shared infrastructure make sense for this? Something that would
> also handle hotplug operations?
>
> Does it possibly go even beyond networking drivers?

>From a generic solution perspective, I think it makes sense to come up with a
shared infrastructure.
Something that can be consumed by all the drivers and maybe hotplug operations
as well (I will have to further explore the hotplug part).

However, there are RT workloads that are getting affected because of this
issue, so does it make sense to go ahead with this per-driver basis approach
for now?

Since a generic solution will require a fair amount of testing and
understanding of different drivers. Having said that, I can definetly start
looking in that direction.

Thanks for reviewing.

> Thanks.
>
>> for this patch i40e
>> Acked-by: Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx>
--
Nitesh

Attachment: signature.asc
Description: OpenPGP digital signature