Re: [PATCH] Enhance perf to collect KVM guest os statistics fromhost side

From: Ingo Molnar
Date: Mon Mar 22 2010 - 06:59:42 EST



* oerg Roedel <joro@xxxxxxxxxx> wrote:

> On Sun, Mar 21, 2010 at 07:43:00PM +0100, Ingo Molnar wrote:
> > Having access to the actual executable files that include the symbols achieves
> > precisely that - with the additional robustness that all this functionality is
> > concentrated into the host, while the guest side is kept minimal (and
> > transparent).
>
> If you want to access the guests file-system you need a piece of software
> running in the guest which gives you this access. But when you get an event
> this piece of software may not be runnable (if the guest is in an interrupt
> handler or any other non-preemptible code path). When the host finally gets
> access to the guests filesystem again the source of that event may already
> be gone (process has exited, module unloaded...). The only way to solve that
> is to pass the event information to the guest immediatly and let it collect
> the information we want.

The very same is true of profiling in the host space as well (KVM is nothing
special here, other than its unreasonable insistence on not enumerating
readily available information in a more usable way).

So are you suggesting a solution to a perf problem we already solved
differently? (and which i argue we solved in a better way)

We have solved that in the host space already (and quite elaborately so), and
not via your suggestion of moving symbol resolution to a different stage, but
by properly generating the right events to allow the post-processing stage to
see processes that have already exited, to robustly handle files that have
been rebuilt, etc.

>From an instrumentation POV it is fundamentally better to acquire the right
data and delay any complexities to the analysis stage (the perf model) than to
complicate sampling (the oprofile dcookies model).

Your proposal of 'doing the symbol resolution in the guest context' is in
essence re-arguing that very similar point that oprofile lost. Did you really
intend to re-argue that point as well? If yes then please propose an
alternative implementation for everything that perf does wrt. symbol lookups.

What we propose for 'perf kvm' right now is simply a straight-forward
extension of the existing (and well working) symbol handling code to
virtualization.

> > You need to be aware of the fact that symbol resolution is a separate step
> > from call chain generation.
>
> Same concern as above applies to call-chain generation too.

Best would be if you demonstrated any problems of the perf symbol lookup code
you are aware of on the host side, as it has that exact design you are
criticising here. We are eager to fix any bugs in it.

If you claim that it's buggy then that should very much be demonstratable - no
need to go into theoretical arguments about it.

( You should be aware of the fact that perf currently works with 'processes
exiting prematurely' and similar scenarios just fine, so if you want to
demonstrate that it's broken you will probably need a different example. )

> > > How we speak to the guest was already discussed in this thread. My
> > > personal opinion is that going through qemu is an unnecessary step and
> > > we can solve that more clever and transparent for perf.
> >
> > Meaning exactly what?
>
> Avi was against that but I think it would make sense to give names to
> virtual machines (with a default, similar to network interface names). Then
> we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/).
> Inside the guest a (priviledged) process can create some kind of named
> virt-pipe which results in a device file created in the guests directory
> (perf could create /dev/vm/fedora/perf for example). This file is used for
> guest-host communication.

That is kind of half of my suggestion - the built-in enumeration guests and a
guaranteed channel to them accessible to tools. (KVM already has its own
special channel so it's not like channels of communication are useless.)

The other half of my suggestion is that if we bring this thought to its
logical conclusion then we might as well walk the whole mile and not use
quirky, binary API single-channel pipes. I.e. we could use this convenient,
human-readable, structured, hierarchical abstraction to expose information in
a finegrained, scalable way, which has a world-class implementation in Linux:
the 'VFS namespace'.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/