Re: [PATCH] Enhance perf to collect KVM guest os statistics fromhost side

From: Avi Kivity
Date: Tue Mar 16 2010 - 06:14:27 EST


On 03/16/2010 11:53 AM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx> wrote:

On 03/16/2010 09:24 AM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx> wrote:

On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
From: Zhang, Yanmin<yanmin_zhang@xxxxxxxxxxxxxxx>

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new subcommand kvm to perf.

perf kvm top
perf kvm record
perf kvm report
perf kvm diff

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

Excellent, support for guest kernel != host kernel is critical (I
can't remember the last time I ran same kernels).

How would we support multiple guests with different kernels? Perhaps a
symbol server that perf can connect to (and that would connect to guests in
turn)?
The highest quality solution would be if KVM offered a 'guest extension' to
the guest kernel's /proc/kallsyms that made it easy for user-space to get this
information from an authorative source.

That's the main reason why the host side /proc/kallsyms is so popular and so
useful: while in theory it's mostly redundant information which can be gleaned
>from the System.map and other sources of symbol information, it's easily
available and is _always_ trustable to come from the host kernel.

Separate System.map's have a tendency to go out of sync (or go missing when a
devel kernel gets rebuilt, or if a devel package is not installed), and server
ports (be that a TCP port space server or an UDP port space mount-point) are
both a configuration hassle and are not guest-transparent.

So for instrumentation infrastructure (such as perf) we have a large and well
founded preference for intrinsic, built-in, kernel-provided information: i.e.
a largely 'built-in' and transparent mechanism to get to guest symbols.
The symbol server's client can certainly access the bits through vmchannel.
Ok, that would work i suspect.

Would be nice to have the symbol server in tools/perf/ and also make it easy
to add it to the initrd via a .config switch or so.

That would have basically all of the advantages of being built into the kernel
(availability, configurability, transparency, hackability), while having all
the advantages of a user-space approach as well (flexibility, extensibility,
robustness, ease of maintenance, etc.).

Note, I am not advocating building the vmchannel client into the host kernel. While that makes everything simpler for the user, it increases the kernel footprint with all the disadvantages that come with that (any bug is converted into a host DoS or worse).

So, perf would connect to qemu via (say) a well-known unix domain socket, which would then talk to the guest kernel.

I know you won't like it, we'll continue to disagree on this unfortunately.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/