Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

From: Davidlohr Bueso
Date: Wed Oct 19 2016 - 14:43:22 EST


On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote:

On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:
Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming
a non-trivial (ie 13%) amount of CPU. This is due to cacheline
bouncing due to the increment of w->ops. We can easily fix this by
just working on a local copy and updating the actual worker once
done running, and ready to show the program summary. There is no
danger of the worker being concurrent, so we can trust that no stale
value is being seen by another thread.

Reported-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Acked-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

Thanks.


--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
- unsigned int i;
struct worker *w = (struct worker *) arg;
+ unsigned int i;
+ unsigned long ops = w->ops; /* avoid cacheline bouncing */

we start at 0 so there is probably no need to init it with w->ops.

Yeah, but I prefer having it this way - separates the init from the actual
work (although no big deal here). The extra load happens ncpu times, so
also no big deal.