Re: [PATCH 8/18] 2.6.17.9 perfmon2 patch for review: event sets andmultiplexing support

From: Andrew Morton
Date: Wed Aug 23 2006 - 18:59:09 EST


On Wed, 23 Aug 2006 01:05:59 -0700
Stephane Eranian <eranian@xxxxxxxxxxxxxxxxx> wrote:

> This patch contains the event set and multiplexing support.
>
> On many PMU models, there is not enough counter to collect
> certain metric in one run. Even on those that have potentially
> lots of counters, e.g. P4 with 18, there are oftentimes constraints
> which make measuring certain event together impossible. In those
> situation the user has n choice but to measure with multiple
> runs which is not always practical and prone to errors.
>
> One way to alleviate the problem is to introduce the notion
> of an event set. Each set encapsulates the entire PMU state.
> If a PMU has M counters then each set can define M events.
> Multiple sets can be defined. They are then multiplexed onto
> the actual PMU such that only one is active at any time.
> The collected counts can then be scaled to get an *estimate*
> of what they would have been had each event been measured across
> the entire run. It is important to note that this remains an
> estimate. The faster we can switch, the smaller the blind spots are
> but the higher the overhead is.
>
> Sets and set switching can be implemented at the user level. Yet
> by having kernel support for it, we can signification improve
> performance especially for non self-monitoring per-thread context where
> we guarantee switching always occurs in the context of the monitored thread.
>
> By default, any perfmon2 context is created with a default set, i.e., set0.
> Set can be dynamically created/deleted with specific system calls. A set
> is identified by a simple number (0-65535). The number determines the
> position of the set in an ordered list. The order in the list determines
> the switch order. Switching occurs in a round-robin fashion.
>
> Switching can be triggered by a timeout or after a certain number of overflows.
> The type of switching as well as the timeout is determined per set.
> The timeout granularity is determined by that of the timer tick. The actual
> timeout value is returned to the user.
>
> The file perfmon_sets.c implements:
> - set-related back-end system calls: __pfm_create_evtsets(), __pfm_delete_evtsets(), __pfm_getinfo_evtsets()
> - set switching: pfm_switch_sets(), __pfm_handle_switch_timeout()
>
>
> ...
>
> +struct pfm_event_set *pfm_find_set(struct pfm_context *ctx, u16 set_id,
> + int alloc)
> +{
> + kmem_cache_t *cachep;
> + struct pfm_event_set *set, *new_set, *prev;
> + unsigned long offs;
> + size_t view_size;
> + void *view;
> +
> + PFM_DBG("looking for set=%u", set_id);
> +
> + /*
> + * shortcut for set 0: always exist, cannot be removed
> + */
> + if (set_id == 0 && !alloc)
> + return list_entry(ctx->list.next, struct pfm_event_set, list);
> +
> + prev = NULL;
> + list_for_each_entry(set, &ctx->list, list) {
> + if (set->id == set_id)
> + return set;
> + if (set->id > set_id)
> + break;
> + prev = set;
> + }
> +
> + if (!alloc)
> + return NULL;
> +
> + cachep = ctx->flags.mapset ? pfm_set_cachep : pfm_lg_set_cachep;
> +
> + new_set = kmem_cache_alloc(cachep, SLAB_ATOMIC);

SLAB_ATOMIC is unreliable. Is it possible to use SLAB_KERNEL here? If
coms ecallers can sleep and others cannot then passing in the gfp_flags
would permit improvement here.

> + if (new_set) {
> + memset(new_set, 0, sizeof(*set));

kmem_cache_zalloc() exists.

> + if (ctx->flags.mapset) {
> + view_size = PAGE_ALIGN(sizeof(struct pfm_set_view));
> + view = vmalloc(view_size);

vmalloc() sleeps, so this _could_ have used SLAB_ATOMIC.

> +static struct page *pfm_view_map_pagefault(struct vm_area_struct *vma,
> + unsigned long address, int *type)
> +{
> + void *kaddr;
> + struct page *page;
> +
> + kaddr = vma->vm_private_data;
> + if (kaddr == NULL) {
> + PFM_DBG("no view");
> + return NOPAGE_SIGBUS;
> + }
> +
> + if ( (address < (unsigned long) vma->vm_start) ||
> + (address > (unsigned long) (vma->vm_start + PAGE_SIZE)) )

Should that be >=?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/