Re: [PATCH v2 00/28] Allow parallel MMU operations with TDP MMU

From: Paolo Bonzini
Date: Wed Feb 03 2021 - 06:02:07 EST


On 02/02/21 19:57, Ben Gardon wrote:
The TDP MMU was implemented to simplify and improve the performance of
KVM's memory management on modern hardware with TDP (EPT / NPT). To build
on the existing performance improvements of the TDP MMU, add the ability
to handle vCPU page faults, enabling and disabling dirty logging, and
removing mappings, in parallel. In the current implementation,
vCPU page faults (actually EPT/NPT violations/misconfigurations) are the
largest source of MMU lock contention on VMs with many vCPUs. This
contention, and the resulting page fault latency, can soft-lock guests
and degrade performance. Handling page faults in parallel is especially
useful when booting VMs, enabling dirty logging, and handling demand
paging. In all these cases vCPUs are constantly incurring page faults on
each new page accessed.

Broadly, the following changes were required to allow parallel page
faults (and other MMU operations):
-- Contention detection and yielding added to rwlocks to bring them up to
feature parity with spin locks, at least as far as the use of the MMU
lock is concerned.
-- TDP MMU page table memory is protected with RCU and freed in RCU
callbacks to allow multiple threads to operate on that memory
concurrently.
-- The MMU lock was changed to an rwlock on x86. This allows the page
fault handlers to acquire the MMU lock in read mode and handle page
faults in parallel, and other operations to maintain exclusive use of
the lock by acquiring it in write mode.
-- An additional lock is added to protect some data structures needed by
the page fault handlers, for relatively infrequent operations.
-- The page fault handler is modified to use atomic cmpxchgs to set SPTEs
and some page fault handler operations are modified slightly to work
concurrently with other threads.

This series also contains a few bug fixes and optimizations, related to
the above, but not strictly part of enabling parallel page fault handling.

Correctness testing:
The following tests were performed with an SMP kernel and DBX kernel on an
Intel Skylake machine. The tests were run both with and without the TDP
MMU enabled.
-- This series introduces no new failures in kvm-unit-tests
SMP + no TDP MMU no new failures
SMP + TDP MMU no new failures
DBX + no TDP MMU no new failures
DBX + TDP MMU no new failures

What's DBX? Lockdep etc.?

-- All KVM selftests behave as expected
SMP + no TDP MMU all pass except ./x86_64/vmx_preemption_timer_test
SMP + TDP MMU all pass except ./x86_64/vmx_preemption_timer_test
(./x86_64/vmx_preemption_timer_test also fails without this patch set,
both with the TDP MMU on and off.)

Yes, it's flaky. It depends on your host.

DBX + no TDP MMU all pass
DBX + TDP MMU all pass
-- A VM can be booted running a Debian 9 and all memory accessed
SMP + no TDP MMU works
SMP + TDP MMU works
DBX + no TDP MMU works
DBX + TDP MMU works

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/7172

Looks good! I'll wait for a few days of reviews, but I'd like to queue this for 5.12 and I plan to make it the default in 5.13 or 5.12-rc (depending on when I can ask Red Hat QE to give it a shake).

It also needs more documentation though. I'll do that myself based on your KVM Forum talk so that I can teach myself more of it.

Paolo