Re: [PATCH] perf lock: clean the options for perf record

From: Hitoshi Mitake
Date: Thu Feb 24 2011 - 10:47:11 EST

Next message: Peter Zijlstra: "Re: [CFS Bandwidth Control v4 3/7] sched: throttle cfs_rq entitieswhich exceed their local quota"
Previous message: Konrad Rzeszutek Wilk: "Re: [PATCH] radeon: DMA unmap dummy page during unload/unbind."
Next in thread: Frederic Weisbecker: "Re: [PATCH] perf lock: clean the options for perf record"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2011å02æ23æ 13:17, Hitoshi Mitake wrote:

On 2011å02æ23æ 03:22, Frederic Weisbecker wrote:
On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
How do you think about it?

Most of the lock code (esp the spinlock stuff) is already way over the
threshold of sanity, adding to that for some dubious reasons doesn't
seem like a good idea.

I'm still not at all sure why people want all this lock tracing.

Right, well I can imagine many usecases that could make lock
tracing bring more value than what lockstat already provides,
through a tool like perf lock if we enhance it.

We should probably first focus on developing the tooling side
and make it useful enough that optimizations in the kernel
side become desirable.

Yes, lockstat only provides the lock usage statistics of
entire of the system. perf lock will be able to provide the partial
information of specified term, or the degree of dependency
between locks.

For trial, I created new tracepoint for rwsem and tested.
Names of events are rwsem_{acquire, contended, acquired, release},
their meanings are similar to lock_{...}.

I traced perf bench sched messaging and result was,

mitake@x201i:~/linux/.../tools/perf% ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 1.252 [sec]
mitake@x201i:~/linux/.../tools/perf% sudo ./perf record -R -m 1024 -c 1 -e rwsem:rwsem_acquire -e rwsem:rwsem_release,rwsem:rwsem_contended,rwsem:rwsem_acquired ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 1.332 [sec]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 13.495 MB perf.data (~589597 samples) ]

raw execution of sched messaging was 1.252 sec, and traced version
was 1.332 sec. This overhead is far smaller than the overhead of
current lock tracepoints.

I think that it is possible to write some meaningful tools
like reader/writer ratio measuring. If something can be written,
I'll post it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Peter Zijlstra: "Re: [CFS Bandwidth Control v4 3/7] sched: throttle cfs_rq entitieswhich exceed their local quota"
Previous message: Konrad Rzeszutek Wilk: "Re: [PATCH] radeon: DMA unmap dummy page during unload/unbind."
Next in thread: Frederic Weisbecker: "Re: [PATCH] perf lock: clean the options for perf record"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]