Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior

From: Pavan Kondeti
Date: Thu Apr 27 2023 - 03:47:14 EST


On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote:
> Hi,
>
> This patch series is a continuation of the talk Saravana gave at LPC 2022
> titled "CPUfreq/sched and VM guest workload problems" [1][2][3]. The gist
> of the talk is that workloads running in a guest VM get terrible task
> placement and DVFS behavior when compared to running the same workload in
> the host. Effectively, no EAS for threads inside VMs. This would make power
> and performance terrible just by running the workload in a VM even if we
> assume there is zero virtualization overhead.
>
> We have been iterating over different options for communicating between
> guest and host, ways of applying the information coming from the
> guest/host, etc to figure out the best performance and power improvements
> we could get.
>
> The patch series in its current state is NOT meant for landing in the
> upstream kernel. We are sending this patch series to share the current
> progress and data we have so far. The patch series is meant to be easy to
> cherry-pick and test on various devices to see what performance and power
> benefits this might give for others.
>
> With this series, a workload running in a VM gets the same task placement
> and DVFS treatment as it would when running in the host.
>
> As expected, we see significant performance improvement and better
> performance/power ratio. If anyone else wants to try this out for your VM
> workloads and report findings, that'd be very much appreciated.
>
> The idea is to improve VM CPUfreq/sched behavior by:
> - Having guest kernel to do accurate load tracking by taking host CPU
> arch/type and frequency into account.
> - Sharing vCPU run queue utilization information with the host so that the
> host can do proper frequency scaling and task placement on the host side.
>

[...]

>
> Next steps:
> ===========
> We are continuing to look into communication mechanisms other than
> hypercalls that are just as/more efficient and avoid switching into the VMM
> userspace. Any inputs in this regard are greatly appreciated.
>

I am trying to understand why virtio based cpufrq does not work here?
The VMM on host can process requests from guest VM like freq table,
current frequency and setting the min_freq. I believe Virtio backend
has mechanisms for acceleration (vhost) so that user space is not
involved for every frequency request from the guest.

It has advantages of (1) Hypervisor agnostic (virtio basically)
(2) scheduler does not need additional input, the aggregated min_freq
requests from all guest should be sufficient.

>
> [1] - https://lpc.events/event/16/contributions/1195/
> [2] - https://lpc.events/event/16/contributions/1195/attachments/970/1893/LPC%202022%20-%20VM%20DVFS.pdf
> [3] - https://www.youtube.com/watch?v=hIg_5bg6opU
> [4] - https://chromium-review.googlesource.com/c/crosvm/crosvm/+/4208668
> [5] - https://chromium-review.googlesource.com/c/crosvm/crosvm/+/4288027
>
> David Dai (6):
> sched/fair: Add util_guest for tasks
> kvm: arm64: Add support for get_cur_cpufreq service
> kvm: arm64: Add support for util_hint service
> kvm: arm64: Add support for get_freqtbl service
> dt-bindings: cpufreq: add bindings for virtual kvm cpufreq
> cpufreq: add kvm-cpufreq driver
>
> .../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++
> Documentation/virt/kvm/api.rst | 28 ++
> .../virt/kvm/arm/get_cur_cpufreq.rst | 21 ++
> Documentation/virt/kvm/arm/get_freqtbl.rst | 23 ++
> Documentation/virt/kvm/arm/index.rst | 3 +
> Documentation/virt/kvm/arm/util_hint.rst | 22 ++
> arch/arm64/include/uapi/asm/kvm.h | 3 +
> arch/arm64/kvm/arm.c | 3 +
> arch/arm64/kvm/hypercalls.c | 60 +++++
> drivers/cpufreq/Kconfig | 13 +
> drivers/cpufreq/Makefile | 1 +
> drivers/cpufreq/kvm-cpufreq.c | 245 ++++++++++++++++++
> include/linux/arm-smccc.h | 21 ++
> include/linux/sched.h | 12 +
> include/uapi/linux/kvm.h | 3 +
> kernel/sched/core.c | 24 +-
> kernel/sched/fair.c | 15 +-
> tools/arch/arm64/include/uapi/asm/kvm.h | 3 +
> 18 files changed, 536 insertions(+), 3 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> create mode 100644 Documentation/virt/kvm/arm/get_cur_cpufreq.rst
> create mode 100644 Documentation/virt/kvm/arm/get_freqtbl.rst
> create mode 100644 Documentation/virt/kvm/arm/util_hint.rst
> create mode 100644 drivers/cpufreq/kvm-cpufreq.c

Thanks,
Pavan