Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

From: Chao Gao
Date: Thu Aug 10 2023 - 09:59:05 EST


On Thu, Aug 10, 2023 at 04:56:36PM +0800, Yan Zhao wrote:
>This is an RFC series trying to fix the issue of unnecessary NUMA
>protection and TLB-shootdowns found in VMs with assigned devices or VFIO
>mediated devices during NUMA balance.
>
>For VMs with assigned devices or VFIO mediated devices, all or part of
>guest memory are pinned for long-term.
>
>Auto NUMA balancing will periodically selects VMAs of a process and change
>protections to PROT_NONE even though some or all pages in the selected
>ranges are long-term pinned for DMAs, which is true for VMs with assigned
>devices or VFIO mediated devices.
>
>Though this will not cause real problem because NUMA migration will
>ultimately reject migration of those kind of pages and restore those
>PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically
>with equal SPTEs finally faulted back, wasting CPU cycles and generating
>unnecessary TLB-shootdowns.

In my understanding, NUMA balancing also moves tasks closer to the memory
they are accessing. Can this still work with this series applied?

>
>This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1
>to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that
>the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation
>event is sent for NUMA migration purpose in specific.
>
>Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary
>MMU to avoid NUMA protection introduced page faults and restoration of old
>huge PMDs/PTEs in primary MMU.
>
>Patch 3 introduces a new mmu notifier callback .numa_protect(), which
>will be called in patch 4 when a page is ensured to be PROT_NONE protected.
>
>Then in patch 5, KVM can recognize a .invalidate_range_start() notification
>is for NUMA balancing specific and do not do the page unmap in secondary
>MMU until .numa_protect() comes.
>
>
>Changelog:
>RFC v1 --> v2:
>1. added patch 3-4 to introduce a new callback .numa_protect()
>2. Rather than have KVM duplicate logic to check if a page is pinned for
>long-term, let KVM depend on the new callback .numa_protect() to do the
>page unmap in secondary MMU for NUMA migration purpose.
>
>RFC v1:
>https://lore.kernel.org/all/20230808071329.19995-1-yan.y.zhao@xxxxxxxxx/
>
>Yan Zhao (5):
> mm/mmu_notifier: introduce a new mmu notifier flag
> MMU_NOTIFIER_RANGE_NUMA
> mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate
> purpose
> mm/mmu_notifier: introduce a new callback .numa_protect
> mm/autonuma: call .numa_protect() when page is protected for NUMA
> migrate
> KVM: Unmap pages only when it's indeed protected for NUMA migration
>
> include/linux/mmu_notifier.h | 16 ++++++++++++++++
> mm/huge_memory.c | 6 ++++++
> mm/mmu_notifier.c | 18 ++++++++++++++++++
> mm/mprotect.c | 10 +++++++++-
> virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++---
> 5 files changed, 71 insertions(+), 4 deletions(-)
>
>--
>2.17.1
>