Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

From: David Dunn
Date: Mon Oct 02 2023 - 11:23:35 EST


On Mon, Oct 2, 2023 at 6:30 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> The host OS shouldn't offer facilities that severely limit its own capabilities,
> when there's a better solution. We don't give the FPU to apps exclusively either,
> it would be insanely stupid for a platform to do that.
>

If you think of the guest VM as a usermode application (which it
effectively is), the analogous situation is that there is no way to
tell the usermode application which portions of the FPU state might be
used by the kernel without context switching. Although the kernel can
and does use FPU state, it doesn't zero out a portion of that state
whenever the kernel needs to use the FPU.

Today there is no way for a guest to dynamically adjust which PMU
state is valid or invalid. And this changes based on usage by other
commands run on the host. As observed by perf subsystem running in
the guest kernel, this looks like counters that simply zero out and
stop counting at random.

I think the request here is that there be a way for KVM to be able to
tell the guest kernel (running the perf subsystem) that it has a
functional HW PMU. And for that to be true. This doesn't mean taking
away the use of the PMU any more than exposing the FPU to usermode
applications means taking away the FPU from the kernel. But it does
mean that when entering the KVM run loop, the host perf system needs
to context switch away the host PMU state and allow KVM to load the
guest PMU state. And much like the FPU situation, the portion of the
host kernel that runs between the context switch to the KVM thread and
VMENTER to the guest cannot use the PMU.

This obviously should be a policy set by the host owner. They are
deliberately giving up the ability to profile that small portion of
the host (KVM VCPU thread cannot be profiled) in return to providing a
full set of perf functionality to the guest kernel.

Dave Dunn