Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

From: Yan Zhao
Date: Fri Aug 11 2023 - 01:49:27 EST


On Thu, Aug 10, 2023 at 09:58:43PM +0800, Chao Gao wrote:
> On Thu, Aug 10, 2023 at 04:56:36PM +0800, Yan Zhao wrote:
> >This is an RFC series trying to fix the issue of unnecessary NUMA
> >protection and TLB-shootdowns found in VMs with assigned devices or VFIO
> >mediated devices during NUMA balance.
> >
> >For VMs with assigned devices or VFIO mediated devices, all or part of
> >guest memory are pinned for long-term.
> >
> >Auto NUMA balancing will periodically selects VMAs of a process and change
> >protections to PROT_NONE even though some or all pages in the selected
> >ranges are long-term pinned for DMAs, which is true for VMs with assigned
> >devices or VFIO mediated devices.
> >
> >Though this will not cause real problem because NUMA migration will
> >ultimately reject migration of those kind of pages and restore those
> >PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically
> >with equal SPTEs finally faulted back, wasting CPU cycles and generating
> >unnecessary TLB-shootdowns.
>
> In my understanding, NUMA balancing also moves tasks closer to the memory
> they are accessing. Can this still work with this series applied?
>
For pages protected with PROT_NONE in primary MMU in scanning phase, yes;
For pages not set to PROT_NONE, no.
Because looks this task_numa_migrate() is only triggered in next page
fault when PROT_NONE and accessible VMA is found.