Re: [PATCH v5 06/13] KVM: x86/vmx: Save/Restore host MSR_ARCH_LBR_CTL state

From: Like Xu
Date: Wed Jul 14 2021 - 09:55:55 EST


On 14/7/2021 1:12 am, Jim Mattson wrote:
On Tue, Jul 13, 2021 at 3:16 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote:

On 13/7/2021 5:47 pm, Yang Weijiang wrote:
On Mon, Jul 12, 2021 at 10:23:02AM -0700, Jim Mattson wrote:
On Mon, Jul 12, 2021 at 2:36 AM Yang Weijiang <weijiang.yang@xxxxxxxxx> wrote:

On Fri, Jul 09, 2021 at 03:54:53PM -0700, Jim Mattson wrote:
On Fri, Jul 9, 2021 at 2:51 AM Yang Weijiang <weijiang.yang@xxxxxxxxx> wrote:

If host is using MSR_ARCH_LBR_CTL then save it before vm-entry
and reload it after vm-exit.

I don't see anything being done here "before VM-entry" or "after
VM-exit." This code seems to be invoked on vcpu_load and vcpu_put.

In any case, I don't see why this one MSR is special. It seems that if
the host is using the architectural LBR MSRs, then *all* of the host
architectural LBR MSRs have to be saved on vcpu_load and restored on
vcpu_put. Shouldn't kvm_load_guest_fpu() and kvm_put_guest_fpu() do
that via the calls to kvm_save_current_fpu(vcpu->arch.user_fpu) and
restore_fpregs_from_fpstate(&vcpu->arch.user_fpu->state)?
I looked back on the discussion thread:
https://patchwork.kernel.org/project/kvm/patch/20210303135756.1546253-8-like.xu@xxxxxxxxxxxxxxx/
not sure why this code is added, but IMO, although fpu save/restore in outer loop
covers this LBR MSR, but the operation points are far away from vm-entry/exit
point, i.e., the guest MSR setting could leak to host side for a signicant
long of time, it may cause host side profiling accuracy. if we save/restore it
manually, it'll mitigate the issue signifcantly.

I'll be interested to see how you distinguish the intermingled branch
streams, if you allow the host to record LBRs while the LBR MSRs
contain guest values!

The guest is pretty fine that the real LBR MSRs contain the guest values
even after vm-exit if there is no other LBR user in the current thread.

(The perf subsystem makes this data visible only to the current thread)

Except for MSR_ARCH_LBR_CTL, we don't want to add msr switch overhead to
the vmx transaction (just think about {from, to, info} * 32 entries).

If we have other LBR user (such as a "perf kvm") in the current thread,
the host/guest LBR user will create separate LBR events to compete for
who can use the LBR in the the current thread.

The final arbiter is the host perf scheduler. The host perf will
save/restore the contents of the LBR when switching between two
LBR events.

Indeed, if the LBR hardware is assigned to the host LBR event before
vm-entry, then the guest LBR feature will be broken and a warning
will be triggered on the host.

Are you saying that the guest LBR feature only works some of the time?

If other LBR events preempt KVM-owned LBR events on the current CPU,
we lose the guest LBR feature in the next vm-entry time slice.

How are failures communicated to the guest? If this feature doesn't
follow the architectural specification, perhaps you should consider
offering a paravirtual feature instead.

The failure notification *didn't* bring anything meaningful because
the guest lost the hardware support and the LBR data it captured.


Warnings on the host, by the way, are almost completely useless. How
do I surface such a warning to a customer who has a misbehaving VM? At
the very least, user space should be notified of KVM emulation errors,
so I can get an appropriate message to the customer.

We have recommended in previous legacy LBR enabling commits
that CSP administrators would be better off not using LBR to
interfere with guest that has its own LBR enabled.


LBR is the kind of exclusive hardware resource and cannot be shared
by different host/guest lbr_select configurations.

In that case, it definitely sounds like guest architectural LBRs
should be a paravirtual feature, since you can't actually virtualize
the hardware.

My earlier plan is to enable vmx-switch to handle the switching of
LBR msrs when we have another LBR event competing (maybe w/ XSAVES
help a lot on the overhead) and we expect that this competitive time
will be very short-lived. Or forcing the host perf to always make the
guest demand to be the first priority user in every situation, which
is even harder to get PeterZ to buy in.


I'll check if an inner simplified xsave/restore to guest/host LBR MSRs is meaningful,
the worst case is to drop this patch since it's not correct to only enable host lbr ctl
while still leaves guest LBR data in the MSRs. Thanks for the reminder!