Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

From: Mingwei Zhang
Date: Mon Oct 02 2023 - 15:03:31 EST


On Mon, Oct 2, 2023 at 8:23 AM David Dunn <daviddunn@xxxxxxxxxx> wrote:
>
> On Mon, Oct 2, 2023 at 6:30 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> >
> > The host OS shouldn't offer facilities that severely limit its own capabilities,
> > when there's a better solution. We don't give the FPU to apps exclusively either,
> > it would be insanely stupid for a platform to do that.
> >
>
> If you think of the guest VM as a usermode application (which it
> effectively is), the analogous situation is that there is no way to
> tell the usermode application which portions of the FPU state might be
> used by the kernel without context switching. Although the kernel can
> and does use FPU state, it doesn't zero out a portion of that state
> whenever the kernel needs to use the FPU.
>
> Today there is no way for a guest to dynamically adjust which PMU
> state is valid or invalid. And this changes based on usage by other
> commands run on the host. As observed by perf subsystem running in
> the guest kernel, this looks like counters that simply zero out and
> stop counting at random.
>
> I think the request here is that there be a way for KVM to be able to
> tell the guest kernel (running the perf subsystem) that it has a
> functional HW PMU. And for that to be true. This doesn't mean taking
> away the use of the PMU any more than exposing the FPU to usermode
> applications means taking away the FPU from the kernel. But it does
> mean that when entering the KVM run loop, the host perf system needs
> to context switch away the host PMU state and allow KVM to load the
> guest PMU state. And much like the FPU situation, the portion of the
> host kernel that runs between the context switch to the KVM thread and
> VMENTER to the guest cannot use the PMU.
>
> This obviously should be a policy set by the host owner. They are
> deliberately giving up the ability to profile that small portion of
> the host (KVM VCPU thread cannot be profiled) in return to providing a
> full set of perf functionality to the guest kernel.
>

+1

I was pretty confused until I read this one. Pass-through vPMU for the
guest VM does not conflict with the host PMU software. All we need is
to accept the feasibility that host PMU software (perf subsystem in
Linux) can co-exist with pass-through vPMU in KVM. They could both
work directly on the hardware PMU, operating the registers etc...

To achieve this, I think what we really ask for the perf subsystem in
Linux are two things:
- full context switch for hardware PMU. Currently, perf subsystem is
the exclusive owner of this piece of hardware. So this code is missing
- NMI sharing or NMI control transfer. Either KVM could raise its own
NMI handler and get control transferred or Linux promotes the existing
NMI handler to serve two entities in the kernel.

Once the above is achieved, KVM and perf subsystem in Linux could
harmoniously share the hardware PMU as I believe, instead of forcing
the former as a client of the latter.

To step back a little bit, we are not asking about the feasibility,
since KVM and perf subsystem sharing hardware PMU is a reality because
of TDX/SEV-SNP. So, I think all that is just a draft proposal to make
the sharing clean and efficient.

Thanks.
-Mingwei

> Dave Dunn