Re: [PATCH V2 2/5] ara virt interface of perf to support kvm guestos statistics collection in guest os

From: Avi Kivity
Date: Mon Jun 21 2010 - 08:00:50 EST


On 06/21/2010 12:31 PM, Zhang, Yanmin wrote:
The 2nd patch is to change the definition of perf_event to facilitate
perf attr copy when a hypercall happens.

Signed-off-by: Zhang Yanmin<yanmin_zhang@xxxxxxxxxxxxxxx>

---

--- linux-2.6_tip0620/include/linux/perf_event.h 2010-06-21 15:19:52.821999849 +0800
+++ linux-2.6_tip0620perfkvm/include/linux/perf_event.h 2010-06-21 16:53:49.283999849 +0800
@@ -188,7 +188,10 @@ struct perf_event_attr {
__u64 sample_type;
__u64 read_format;


Assuming these flags are available to the guest?

- __u64 disabled : 1, /* off by default */
+ union {
+ __u64 flags;
+ struct {
+ __u64 disabled : 1, /* off by default */
inherit : 1, /* children inherit it */

inherit is meaningless for a guest.

pinned : 1, /* must always be on PMU */

We cannot allow a guest to pin a counter.

The other flags are also problematic. I'd like to see virt-specific flags (probably we'll only need kernel/user and nested_hv for nested virtualization).

Something that is worrying is that we don't expose group information. perf will multiplex the events for us, but there will be a loss in accuracy.

#ifdef CONFIG_HAVE_HW_BREAKPOINT
#include<asm/hw_breakpoint.h>
#endif
@@ -753,6 +752,20 @@ struct perf_event {

perf_overflow_handler_t overflow_handler;

+ /*
+ * pointers used by kvm perf paravirt interface.
+ *
+ * 1) Used in host kernel and points to host_perf_shadow which
+ * has information about guest perf_event
+ */
+ void *host_perf_shadow;

Can we have real types instead of void pointers?

+ /*
+ * 2) Used in guest kernel and points to guest_perf_shadow which
+ * is used as a communication area with host kernel. Host kernel
+ * copies overflow data to it when an event overflows.
+ */
+ void *guest_perf_shadow;

It's strange to see both guest and host parts in the same patch. Splitting to separate patches will really help review.

@@ -1626,9 +1629,22 @@ void perf_event_task_tick(struct task_st
if (ctx&& ctx->nr_events&& ctx->nr_events != ctx->nr_active)
rotate = 1;

- perf_ctx_adjust_freq(&cpuctx->ctx);
- if (ctx)
- perf_ctx_adjust_freq(ctx);
+#ifdef CONFIG_KVM_PERF
+ if (kvm_para_available()) {
+ /*
+ * perf_ctx_adjust_freq causes lots of pmu->read which would
+ * trigger too many vmexit to host kernel. We disable it
+ * under para virt situation
+ */
+ adjust_freq = 0;
+ }
+#endif

Perhaps we can have a batch read interface which will read many counters at once. This would reduce the number of exits. Also adjust the frequency less frequently.

+
+ if (adjust_freq) {
+ perf_ctx_adjust_freq(&cpuctx->ctx);
+ if (ctx)
+ perf_ctx_adjust_freq(ctx);
+ }


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/