[PATCH v3 0/5] perf: KVM: Enable callchains for guests

From: Tianyi Liu
Date: Sun Dec 10 2023 - 03:09:02 EST


This series of patches enables callchains for guests (used by `perf kvm`),
which holds the top spot on the perf wiki TODO list [1]. This allows users
to perform guest OS callchain or performance analysis from external
using PMU events. This is also useful for guests like unikernels that
lack performance event subsystems.

The event processing flow is as follows (shown as backtrace):
@0 kvm_arch_vcpu_get_unwind_info / kvm_arch_vcpu_read_virt (per arch impl)
@1 kvm_guest_get_unwind_info / kvm_guest_read_virt
<callback function pointers in `struct perf_guest_info_callbacks`>
@2 perf_guest_get_unwind_info / perf_guest_read_virt
@3 perf_callchain_guest
@4 get_perf_callchain
@5 perf_callchain

Between @0 and @1 is the interface between KVM and the arch-specific
impl, while between @1 and @2 is the interface between Perf and KVM.
The 1st patch implements @0. The 2nd patch extends interfaces between @1
and @2, while the 3rd patch implements @1. The 4th patch implements @3
and modifies @4 @5. The last patch is for userspace tools.

Since arm64 hasn't provided some foundational infrastructure (interface
for reading from a virtual address of guest), the arm64 implementation
is stubbed for now because it's a bit complex, and will be implemented
later.

For safety, guests are designed to be read-only in this feature,
and we will never inject page faults into the guests, ensuring that the
guests are not interfered by profiling. In extremely rare cases, if the
guest is modifying the page table, there is a possibility of reading
incorrect data. Additionally, if certain programs running in the guest OS
do not support frame pointers, it may also result in some erroneous data.
These erroneous data will eventually appear as `[unknown]` entries in the
report. It is sufficient as long as most of the records are correct for
profiling.

Regarding the necessity of implementing in the kernel:
Indeed, we could implement this in userspace and access the guest vm
through the KVM APIs, to interrupt the guest and perform unwinding.
However, this approach will introduce higher latency, and the overhead of
syscalls could limit the sampling frequency. Moreover, it appears that
user space can only interrupt the VCPU at a certain frequency, without
fully leveraging the richness of the PMU's performance events. On the
other hand, if we incorporate the logic into kernel, `perf kvm` can bind
to various PMU events and achieve faster performance in PMU interrupts.

Tested with both Linux and unikernels as guests, the `perf script` command
could correctly show the callchains.
FlameGraphs could also be generated with this series of patches and [2].

[1] https://perf.wiki.kernel.org/index.php/Todo
[2] https://github.com/brendangregg/FlameGraph

v1:
https://lore.kernel.org/kvm/SYYP282MB108686A73C0F896D90D246569DE5A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Changes since v1:
Post the complete implementation, also updated some code based on
Sean's feedback.

v2:
https://lore.kernel.org/kvm/SY4P282MB1084ECBCC1B176153B9E2A009DCFA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Changes since v2:
Refactored interface, packaged the info required by unwinding into
a struct; Resolved some type mismatches; Provided more explanations
based on the feedback from v2; more tests were performed.

Tianyi Liu (5):
KVM: Add arch specific interfaces for sampling guest callchains
perf kvm: Introduce guest interfaces for sampling callchains
KVM: implement new perf callback interfaces
perf kvm: Support sampling guest callchains
perf tools: Support PERF_CONTEXT_GUEST_* flags

MAINTAINERS | 1 +
arch/arm64/kvm/arm.c | 12 ++++++
arch/x86/events/core.c | 63 ++++++++++++++++++++++++-----
arch/x86/kvm/x86.c | 24 +++++++++++
include/linux/kvm_host.h | 5 +++
include/linux/perf_event.h | 20 ++++++++-
include/linux/perf_kvm.h | 18 +++++++++
kernel/bpf/stackmap.c | 8 ++--
kernel/events/callchain.c | 27 ++++++++++++-
kernel/events/core.c | 17 +++++++-
tools/perf/builtin-timechart.c | 6 +++
tools/perf/util/data-convert-json.c | 6 +++
tools/perf/util/machine.c | 6 +++
virt/kvm/kvm_main.c | 22 ++++++++++
14 files changed, 218 insertions(+), 17 deletions(-)
create mode 100644 include/linux/perf_kvm.h


base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a
--
2.34.1