Re: [PATCH v4 0/2] Improve VM CPUfreq and task placement behavior

From: Hongyan Xia
Date: Mon Nov 13 2023 - 07:20:38 EST


Hi David,

On 11/11/2023 01:49, David Dai wrote:
Hi,

This patch series is a continuation of the talk Saravana gave at LPC 2022
titled "CPUfreq/sched and VM guest workload problems" [1][2][3]. The gist
of the talk is that workloads running in a guest VM get terrible task
placement and CPUfreq behavior when compared to running the same workload
in the host. Effectively, no EAS(Energy Aware Scheduling) for threads
inside VMs. This would make power and performance terrible just by running
the workload in a VM even if we assume there is zero virtualization
overhead.

With this series, a workload running in a VM gets the same task placement
and CPUfreq behavior as it would when running in the host.

The idea is to improve VM CPUfreq/sched behavior by:
- Having guest kernel do accurate load tracking by taking host CPU
arch/type and frequency into account.
- Sharing vCPU frequency requirements with the host so that the
host can do proper frequency scaling and task placement on the host side.

Based on feedback from RFC v1 proposal[4], we've revised our
implementation to using MMIO reads and writes to pass information
from/to host instead of using hypercalls. In our example, the
VMM(Virtual Machine Manager) translates the frequency requests into
Uclamp_min and applies it to the vCPU thread as a hint to the host
kernel.

Sorry for not noticing this series until now.

The problem you are having with uclamp is actually the same as what
I'm tackling right now. Basically my conclusion so far is that uclamp
max aggregation faces quite many problems, which can be easily solved by
sum aggregation (summing up the clamped utilization values instead of
applying the max uclamp value to the whole rq):

https://lore.kernel.org/all/cover.1696345700.git.Hongyan.Xia2@xxxxxxx/

What you described as util_guest sounds to me as exactly what uclamp_min
under sum aggregation does. I'm really tempted to ask you to apply my
series and see if the new uclamp_min does what you want, instead of
introducing a new util_guest signal. If you have no time for this I can
try to replicate your setup and do the experiments myself.

Also, my knowledge with KVM is limited. May I know where the vCPU fork
happens? Can't you just set the p->sched_reset_on_fork flag on fork to
not carry forward the uclamp values?


[...]
Hongyan