Re: [PATCH] perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir()

From: Yang Jihong
Date: Fri Oct 20 2023 - 21:47:48 EST


Hello,

On 2023/10/20 13:59, Namhyung Kim wrote:
Hello,

On Fri, Oct 13, 2023 at 1:01 AM Yang Jihong <yangjihong1@xxxxxxxxxx> wrote:

If using parallel threads to collect data, perf record needs at least 6 fds
per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)

Yep, probably one more for the dummy event.

For an environment with more than 100 cores, if perf record uses both
`-a` and `--threads` options, it is easy to exceed the upper limit of the
file descriptor number, when we run out of them try to increase the limits.

Before:
$ ulimit -n
1024
$ lscpu | grep 'On-line CPU(s)'
On-line CPU(s) list: 0-159
$ perf record --threads -a sleep 1
Failed to create data directory: Too many open files

After:
$ ulimit -n
1024
$ lscpu | grep 'On-line CPU(s)'
On-line CPU(s) list: 0-159
$ perf record --threads -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]
Signed-off-by: Yang Jihong <yangjihong1@xxxxxxxxxx>
---
tools/perf/util/data.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index fc16299c915f..098f9e3bb2e7 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -17,6 +17,7 @@
#include "util.h" // rm_rf_perf_data()
#include "debug.h"
#include "header.h"
+#include "evsel.h"
#include <internal/lib.h>

static void close_dir(struct perf_data_file *files, int nr)
@@ -35,6 +36,7 @@ void perf_data__close_dir(struct perf_data *data)

int perf_data__create_dir(struct perf_data *data, int nr)
{
+ enum rlimit_action set_rlimit = NO_CHANGE;
struct perf_data_file *files = NULL;
int i, ret;

@@ -54,11 +56,21 @@ int perf_data__create_dir(struct perf_data *data, int nr)
goto out_err;
}

+retry_open:
ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
if (ret < 0) {
+ /*
+ * If using parallel threads to collect data,
+ * perf record needs at least 6 fds per CPU.
+ * When we run out of them try to increase the limits.
+ */
+ if (errno == EMFILE && evsel__increase_rlimit(&set_rlimit))

It seems weird that we have this helper with evsel prefix and
it does nothing with evsel. But it's a separate concern, so

Acked-by: Namhyung Kim <namhyung@xxxxxxxxxx>
Thanks for the Acked-by tag.

Uh... The name of this helper doesn't seem to be appropriate.
Okay, I'll submit a patch to fix this helper name.

Thanks,
Yang