Re: [bug report] GICv4.1: VM performance degradation due to not trapping vCPU WFI

From: sundongxu (A)
Date: Thu Jan 18 2024 - 02:56:43 EST


On 2024/1/18 0:50, Oliver Upton wrote:
> On Wed, Jan 17, 2024 at 10:20:32PM +0800, sundongxu (A) wrote:
>> On 2024/1/16 19:13, Marc Zyngier wrote:
>>> On Tue, 16 Jan 2024 03:26:08 +0000, "sundongxu (A)" <sundongxu3@xxxxxxxxxx> wrote:
>>>> We found a problem about GICv4/4.1, for example:
>>>> We use QEMU to start a VM (4 vCPUs and 8G memory), VM disk was
>>>> configured with virtio, and the network is configured with vhost-net,
>>>> the CPU affinity of the vCPU and emulator is as follows, in VM xml:
>
> <snip>
>
>>>> <cputune>
>>>> <vcpupin vcpu='0' cpuset='4'/>
>>>> <vcpupin vcpu='1' cpuset='5'/>
>>>> <vcpupin vcpu='2' cpuset='6'/>
>>>> <vcpupin vcpu='3' cpuset='7'/>
>>>> <emulatorpin cpuset='4,5,6,7'/>
>>>> </cputune>
>
> </snip>
>
>>> Effectively, we apply the same principle to vSGIs as to vLPIs, and it
>>> was found that this heuristic was pretty beneficial to vLPIs. I'm a
>>> bit surprised that vSGIs are so different in their usage pattern.
>>
>> IMO, the point is hypervisor not trapping vCPU WFI, rather than
>> vSGI/vLPI usage pattern.
>
> Sure, that's what's affecting your use case, but the logic in the kernel
> came about because improving virtual interrupt injection has been found
> to be generally useful.
>
>>>
>>> Does it help if you move your "emulatorpin" to some other physical
>>> CPUs?
>>
>> Yes,it does, in kernel 5.10 or 6.5rc1.
>
> Won't your VM have a poor experience in this configuration regardless of WFx
> traps? The value of vCPU pinning is to *isolate* the vCPU threads from
> noise/overheads of the host and scheduler latencies. Seems to me that
> VMM overhead threads are being forced to take time away from the guest.

When the emulators and vCPUs have affinity on same CPUs, the VM
performance is worse than when emulators and vCPUs have affinity on
different CPUs. Emulators will steal time from vCPU, since we need them
to deal with some IO/net requests. If we allocate 4 pCPUs to one VM, we
do not want it's emulators to run on other pCPU, which may interfere
with other VMs. May be SPDK/DPDK will alleviate the issue.

>
> Nevertheless, disabling WFI traps isn't going to work well for
> overcommitted scenarios. The thought of tacking on more hacks in KVM has be
> a bit uneasy, perhaps instead we can give userspace an interface to explicitly
> enable/disable WFx traps and let it pick a suitable policy.

Agreed, I added a KVM parameter to do that, and default trapping vCPU WFI.

Thanks,
Dongxu