Re: [PATCH v2 0/4] KVM: x86: tracepoint updates

From: Sean Christopherson
Date: Tue Sep 26 2023 - 16:45:31 EST


On Tue, Sep 26, 2023, Paolo Bonzini wrote:
> On 9/26/23 10:28, Maxim Levitsky wrote:
> > > trace_kvm_exit is good example, where despite all of the information that is captured
> > > by KVM, it's borderline worthless for CPUID and MSR exits because their interesting
> > > information is held in registers and not captured in the VMCS or VMCB.
> > >
> > > There are some on BTF type info issues that I've encountered, but I suspect that's
> > > as much a PEBKAC problem as anything.
> > >
> > While eBPF has its use cases, none of the extra tracepoints were added solely because of
> > the monitoring tool and I do understand that tracepoints are a limited resource.
> >
> > Each added tracepoint/info was added only when it was also found to be useful for regular
> > kvm tracing.
>
> I am not sure about _all_ of them, but I agree with both of you.
>
> On one hand, it would be pretty cool to have eBPF access to registers. On
> the other hand, the specific info you're adding is generic and I think there
> are only a couple exceptions where I am not sure it belongs in the generic
> KVM tracepoints.

I'm not saying this information isn't useful, *sometimes*. What I'm saying is
that I don't think it's sustainable/maintainble to keep expanding KVM's tracepoints.
E.g. why trace req_immediate_exit and not mmu_invalidate_seq[*]?

It's practically impossible to predict exactly what information will be useful in
the field. And for something like kvmon, IMO the onus is on userspace to do the
heavy lifting.

Rather than hardcode mounds of information in KVM's tracepoints, I think we should
refactor KVM to make it as easy as possible to use BPF to piggyback tracepoints
and get at data that *might* be interesting, and then add a variety of BPF programs
(e.g. using bpftrace for simplicity) somewhere in tools/ to provide equivalent
functionality to select existing tracepoints.

E.g. if we rework kvm_vcpu to place "struct kvm_vcpu_arch arch" at offset '0',
then we get at all the GPRs and pseudo-registers by hardcoding offsets into the
struct, i.e. without needing BTF type info. More elaborate trace programs would
likely need BTF, or maybe some clever shenanigans, but it seems very doable.

[*] https://lore.kernel.org/all/ZOaUdP46f8XjQvio@xxxxxxxxxx