Re: [PATCH 2/9] perf_counter: fix update_userpage()

From: Paul Mackerras
Date: Sat Mar 28 2009 - 20:25:42 EST

Next message: Thomas Gleixner: "Re: [git-pull -tip] x86: include inverse Xmas tree patches"
Previous message: Paul Mackerras: "Re: [PATCH 1/9] perf_counter: unify and fix delayed counter wakeup"
In reply to: Peter Zijlstra: "[PATCH 2/9] perf_counter: fix update_userpage()"
Next in thread: Peter Zijlstra: "[PATCH 7/9] perf_counter: make it possible for hw_perf_counter_init to return error codes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Peter Zijlstra writes:

> It just occured to me it is possible to have multiple contending
> updates of the userpage (mmap information vs overflow vs counter).
> This would break the seqlock logic.
>
> It appear the arch code uses this from NMI context, so we cannot
> possibly serialize its use, therefore separate the data_head update
> from it and let it return to its original use.

That sounds reasonable, and thanks for putting in a big comment.

Acked-by: Paul Mackerras <paulus@xxxxxxxxx>

> --- linux-2.6.orig/include/linux/perf_counter.h
> +++ linux-2.6/include/linux/perf_counter.h
> @@ -160,10 +160,45 @@ struct perf_counter_hw_event {
> struct perf_counter_mmap_page {
> __u32 version; /* version number of this structure */
> __u32 compat_version; /* lowest version this is compat with */
> +
> + /*
> + * Bits needed to read the hw counters in user-space.
> + *
> + * The index and offset should be read atomically using the seqlock:
> + *
> + * __u32 seq, index;
> + * __s64 offset;
> + *
> + * again:
> + * rmb();
> + * seq = pc->lock;
> + *
> + * if (unlikely(seq & 1)) {
> + * cpu_relax();
> + * goto again;
> + * }
> + *
> + * index = pc->index;
> + * offset = pc->offset;
> + *
> + * rmb();
> + * if (pc->lock != seq)
> + * goto again;
> + *
> + * After this, index contains architecture specific counter index + 1,
> + * so that 0 means unavailable, offset contains the value to be added
> + * to the result of the raw timer read to obtain this counter's value.
> + */
> __u32 lock; /* seqlock for synchronization */
> __u32 index; /* hardware counter identifier */
> __s64 offset; /* add to hardware counter value */

I think we can simplify this (in a follow-on patch).

It has occurred to me that we don't need to do all this on the
userspace side, because we are necessarily reading and writing these
fields on the same CPU. If the reader and writer were on different
CPUs, that would make no sense since they would be accessing different
hardware counter registers.

That means that we don't need any CPU memory barriers on either side.
All the kernel needs to do is to increment `lock' when it updates
things, and the user side can be:

do {
seq = pc->lock;
index = pc->index;
offset = pc->offset;
barrier();
} while (pc->lock != seq);

and all that's needed is a compiler barrier to stop the compiler from
optimizing too much.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Thomas Gleixner: "Re: [git-pull -tip] x86: include inverse Xmas tree patches"
Previous message: Paul Mackerras: "Re: [PATCH 1/9] perf_counter: unify and fix delayed counter wakeup"
In reply to: Peter Zijlstra: "[PATCH 2/9] perf_counter: fix update_userpage()"
Next in thread: Peter Zijlstra: "[PATCH 7/9] perf_counter: make it possible for hw_perf_counter_init to return error codes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]