[tip:perf/core] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task

From: tip-bot for Yunlong Song
Date: Wed Apr 08 2015 - 11:13:26 EST


Commit-ID: 1aff59be53ef37aa9943fb5f772f03148f789bb6
Gitweb: http://git.kernel.org/tip/1aff59be53ef37aa9943fb5f772f03148f789bb6
Author: Yunlong Song <yunlong.song@xxxxxxxxxx>
AuthorDate: Tue, 31 Mar 2015 21:46:33 +0800
Committer: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
CommitDate: Wed, 8 Apr 2015 09:07:25 -0300

perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task

Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).

The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.

For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.

As a result, all the tasks are required and their threads must be
successfully created.

If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.

And this is a dead halt, since task B's thread_func cannot continue at
all.

To solve this problem, perf sched replay should exit once any task fails
to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

$ perf sched replay
...
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
------------------------------------------------------------ <- dead halt

After this patch:

$ perf sched replay
...
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
$

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song <yunlong.song@xxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Wang Nan <wangnan0@xxxxxxxxxx>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@xxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/builtin-sched.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7fe3b3c..3261300 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -451,10 +451,12 @@ static int self_open_counters(void)
fd = sys_perf_event_open(&attr, 0, -1, -1,
perf_event_open_cloexec_flag());

- if (fd < 0)
+ if (fd < 0) {
pr_err("Error: sys_perf_event_open() syscall returned "
"with %d (%s)\n", fd,
strerror_r(errno, sbuf, sizeof(sbuf)));
+ exit(EXIT_FAILURE);
+ }
return fd;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/