RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

From: Shameerali Kolothum Thodi
Date: Mon Aug 09 2021 - 09:48:46 EST




> -----Original Message-----
> From: Will Deacon [mailto:will@xxxxxxxxxx]
> Sent: 09 August 2021 14:09
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; maz@xxxxxxxxxx; catalin.marinas@xxxxxxx;
> james.morse@xxxxxxx; julien.thierry.kdev@xxxxxxxxx;
> suzuki.poulose@xxxxxxx; jean-philippe@xxxxxxxxxx;
> Alexandru.Elisei@xxxxxxx; qperret@xxxxxxxxxx; Linuxarm
> <linuxarm@xxxxxxxxxx>
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
>
> On Fri, Aug 06, 2021 at 12:24:36PM +0000, Shameerali Kolothum Thodi
> wrote:
> > These are some test numbers with and without this patch, run on two
> > different test setups.
> >
> >
> > a)Test Setup -1
> > -----------------------
> >
> > Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16
> > Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute
> hackbench
> > 5 times before exiting.
> >
> > Measurements taken avg. of 10 Runs.
> >
> > Image : 5.14-rc3
> > ---------------------------
> > Time(s) 44.43813888
> > No. of exits 145,348,264
> >
> > Image: 5.14-rc3 + vmid-v3
> > ----------------------------------------
> > Time(s) 46.59789034
> > No. of exits 133,587,307
> >
> > %diff against 5.14-rc3
> > Time: 4.8% more
> > Exits: 8% less
> >
> > Image: 5.14-rc3 + vmid-v3 + Without active_asid clear
> > ---------------------------------------------------------------------------
> > Time(s) 44.5031782
> > No. of exits 144,443,188
> >
> > %diff against 5.14-rc3
> > Time: 0.15% more
> > Exits: 2.42% less
> >
> > b)Test Setup -2
> > -----------------------
> >
> > Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to
> 4.
> > Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute
> hackbench
> > 5 times before exiting.
> >
> > Measurements taken avg. of 10 Runs.
> >
> > Image : 5.14-rc3-vmid4bit
> > ------------------------------------
> > Time(s) 46.19963266
> > No. of exits 23,699,546
> >
> > Image: 5.14-rc3-vmid4bit + vmid-v3
> > ---------------------------------------------------
> > Time(s) 45.83307736
> > No. of exits 23,260,203
> >
> > %diff against 5.14-rc3-vmid4bit
> > Time: 0.8% less
> > Exits: 1.85% less
> >
> > Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear
> > -----------------------------------------------------------------------------------------
> > Time(s) 44.5031782
> > No. of exits 144,443,188
>
> Really? The *exact* same numbers as the "Image: 5.14-rc3 + vmid-v3 +
> Without
> active_asid clear" configuration? Guessing a copy-paste error here.
>
> > %diff against 5.14-rc3-vmid4bit
> > Time: 1.05% less
> > Exits: 2.06% less
> >
> > As expected, the active_asid clear on schedule out is not helping.
> > But without this patch, the numbers seems to be better than the
> > vanilla kernel when we force the setup(cpus=8, vmd=4bits)
> > to perform rollover.
>
> I'm struggling a bit to understand these numbers. Are you saying that
> clearing the active_asid helps in the 16-bit VMID case but not in the
> 4-bit case?

Nope, the other way around.. The point I was trying to make is that
clearing the active_vmids definitely have an impact in 16-bit vmid
case, where rollover is not happening, as it ends up taking the slow
path more frequently.

Test setup-1, case 2(with active_vmids clear): Around 4.8% more time
to finish the test compared to vanilla kernel.

Test setup-1, case 3(Without clear): 0.15% more time compared to
vanilla kernel.

For the 4-bit vmid case, the impact of clearing vmids is not that obvious
probably because we have more rollovers.

Test setup-2, case 2(with active_vmids clear):0.8% less time compared to vanilla.
Test setup-2, case 3(Without clear): 1.05% less time compared to vanilla kernel.

So between the two(with and without clearing the active_vmids), the "without"
one has better numbers for both Test setups.

> Why would the active_asid clear have any impact on the number of exits?

In 16 bit vmid case, it looks like the no. of exits is considerably lower if we clear
active_vmids. . Not sure it is because of the frequent slow path or not. But anyway,
the time to finish the test is higher.

> The problem I see with not having the active_asid clear is that we will
> roll over more frequently as the number of reserved VMIDs increases.

Ok. The idea of running the 4-bit test setup was to capture that. It doesn't
look like it has a major impact when compared to the original kernel. May be
I should take an average of more test runs. Please let me know if there is a
better way to measure that impact.

Hope, I am clear.

Thanks,
Shameer