[tip:perf/core] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files

From: tip-bot for Yunlong Song
Date: Wed Apr 08 2015 - 11:13:44 EST


Commit-ID: 939cda521a24ae4dbf3beec983abd519bce56231
Gitweb: http://git.kernel.org/tip/939cda521a24ae4dbf3beec983abd519bce56231
Author: Yunlong Song <yunlong.song@xxxxxxxxxx>
AuthorDate: Tue, 31 Mar 2015 21:46:34 +0800
Committer: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
CommitDate: Wed, 8 Apr 2015 09:07:26 -0300

perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files

The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.

Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.

And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.

That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.

This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.

For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.

That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.

As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.

Example:

Test environment: x86_64 with 160 cores

$ cat /proc/sys/kernel/pid_max
163840
$ cat /proc/sys/fs/file-max
6815744
$ ulimit -Sn
1024
$ ulimit -Hn
4096

Before this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)

After this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
Have a try with -f option

$ perf sched replay -f
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
------------------------------------------------------------
#1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
#2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
#3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
#4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
#5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
#6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
#7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
#8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
#9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
#10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

Signed-off-by: Yunlong Song <yunlong.song@xxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Wang Nan <wangnan0@xxxxxxxxxx>
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@xxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/builtin-sched.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3261300..5ab58c6 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -170,6 +170,7 @@ struct perf_sched {
u64 cpu_last_switched[MAX_CPUS];
struct rb_root atom_root, sorted_atom_root;
struct list_head sort_list, cmp_pid;
+ bool force;
};

static u64 get_nsecs(void)
@@ -437,24 +438,43 @@ static u64 get_cpu_usage_nsec_parent(void)
return sum;
}

-static int self_open_counters(void)
+static int self_open_counters(struct perf_sched *sched, unsigned long cur_task)
{
struct perf_event_attr attr;
- char sbuf[STRERR_BUFSIZE];
+ char sbuf[STRERR_BUFSIZE], info[STRERR_BUFSIZE];
int fd;
+ struct rlimit limit;
+ bool need_privilege = false;

memset(&attr, 0, sizeof(attr));

attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_TASK_CLOCK;

+force_again:
fd = sys_perf_event_open(&attr, 0, -1, -1,
perf_event_open_cloexec_flag());

if (fd < 0) {
+ if (errno == EMFILE) {
+ if (sched->force) {
+ BUG_ON(getrlimit(RLIMIT_NOFILE, &limit) == -1);
+ limit.rlim_cur += sched->nr_tasks - cur_task;
+ if (limit.rlim_cur > limit.rlim_max) {
+ limit.rlim_max = limit.rlim_cur;
+ need_privilege = true;
+ }
+ if (setrlimit(RLIMIT_NOFILE, &limit) == -1) {
+ if (need_privilege && errno == EPERM)
+ strcpy(info, "Need privilege\n");
+ } else
+ goto force_again;
+ } else
+ strcpy(info, "Have a try with -f option\n");
+ }
pr_err("Error: sys_perf_event_open() syscall returned "
- "with %d (%s)\n", fd,
- strerror_r(errno, sbuf, sizeof(sbuf)));
+ "with %d (%s)\n%s", fd,
+ strerror_r(errno, sbuf, sizeof(sbuf)), info);
exit(EXIT_FAILURE);
}
return fd;
@@ -542,7 +562,7 @@ static void create_tasks(struct perf_sched *sched)
BUG_ON(parms == NULL);
parms->task = task = sched->tasks[i];
parms->sched = sched;
- parms->fd = self_open_counters();
+ parms->fd = self_open_counters(sched, i);
sem_init(&task->sleep_sem, 0, 0);
sem_init(&task->ready_for_work, 0, 0);
sem_init(&task->work_done_sem, 0, 0);
@@ -1700,6 +1720,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
"be more verbose (show symbol address, etc)"),
OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
"dump raw trace in ASCII"),
+ OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
OPT_END()
};
const struct option sched_options[] = {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/