[PATCH] KVM: VMX: Optimize flushing the PML buffer

From: Ben Gardon
Date: Thu Feb 04 2021 - 17:20:47 EST


vmx_flush_pml_buffer repeatedly calls kvm_vcpu_mark_page_dirty, which
SRCU-derefrences kvm->memslots. In order to give the compiler more
freedom to optimize the function, SRCU-dereference the pointer
kvm->memslots only once.

Reviewed-by: Makarand Sonare <makarandsonare@xxxxxxxxxx>
Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx>

---

Tested by running the dirty_log_perf_test selftest on a dual socket Intel
Skylake machine:
./dirty_log_perf_test -v 4 -b 30G -i 5

The test was run 5 times with and without this patch and the dirty
memory time for iterations 2-5 was averaged across the 5 runs.
Iteration 1 was discarded for this analysis because it is still dominated
by the time spent populating memory.

The average time for each run demonstrated a strange bimodal distribution,
with clusters around 2 seconds and 2.5 seconds. This may have been a
result of vCPU migration between NUMA nodes.

In any case, the get dirty times with this patch averaged to 2.07
seconds, a 7% savings from the 2.22 second everage without this patch.

While these savings may be partly a result of the patched runs having
one more 2 second clustered run, the patched runs in the higer cluster
were also 7-8% shorter than those in the unpatched case.

Below is the raw data for anyone interested in visualizing the results
with a graph:
Iteration Baseline Patched
2 2.038562907 2.045226614
3 2.037363248 2.045033709
4 2.037176331 1.999783966
5 1.999891981 2.007849104
2 2.569526298 2.001252504
3 2.579110209 2.008541897
4 2.585883731 2.005317983
5 2.588692727 2.007100987
2 2.01191437 2.006953735
3 2.012972236 2.04540153
4 1.968836017 2.005035246
5 1.967915154 2.003859551
2 2.037533296 1.991275846
3 2.501480125 2.391886691
4 2.454382587 2.391904789
5 2.461046772 2.398767963
2 2.036991484 2.011331436
3 2.002954418 2.002635687
4 2.053342717 2.006769959
5 2.522539759 2.006470059
Average 2.223405818 2.069119963

arch/x86/kvm/vmx/vmx.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..46c54802dfdb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5692,6 +5692,7 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_memslots *memslots;
u64 *pml_buf;
u16 pml_idx;

@@ -5707,13 +5708,18 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
else
pml_idx++;

+ memslots = kvm_vcpu_memslots(vcpu);
+
pml_buf = page_address(vmx->pml_pg);
for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
+ struct kvm_memory_slot *memslot;
u64 gpa;

gpa = pml_buf[pml_idx];
WARN_ON(gpa & (PAGE_SIZE - 1));
- kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+
+ memslot = __gfn_to_memslot(memslots, gpa >> PAGE_SHIFT);
+ mark_page_dirty_in_slot(vcpu->kvm, memslot, gpa >> PAGE_SHIFT);
}

/* reset PML index */
--
2.30.0.365.g02bc693789-goog