Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

From: David Hildenbrand
Date: Thu Aug 17 2023 - 03:40:20 EST


On 17.08.23 07:05, Yan Zhao wrote:
On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
On 8/16/23 02:49, David Hildenbrand wrote:
But do 32bit architectures even care about NUMA hinting? If not, just
ignore them ...

Probably not!

...
So, do you mean that let kernel provide a per-VMA allow/disallow
mechanism, and
it's up to the user space to choose between per-VMA and complex way or
global and simpler way?

QEMU could do either way. The question would be if a per-vma settings
makes sense for NUMA hinting.

From our experience with compute on GPUs, a per-mm setting would suffice.
No need to go all the way to VMA granularity.

After an offline internal discussion, we think a per-mm setting is also
enough for device passthrough in VMs.

BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
think it's of any value to providing a flag like VM_MAYDMA?
Auto NUMA balancing or other components can decide how to use it by
themselves.

Short-lived DMA is not really the problem. The problem is long-term pinning.

There was a discussion about letting user space similarly hint that long-term pinning might/will happen.

Because when long-term pinning a page we have to make sure to migrate it off of ZONE_MOVABLE / MIGRATE_CMA.

But the kernel prefers to place pages there.

So with vfio in QEMU, we might preallocate memory for the guest and place it on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to migrate all these fresh pages out of these areas again.

So letting the kernel know about that in this context might also help.

--
Cheers,

David / dhildenb