Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()

From: Steven Rostedt
Date: Thu Jan 04 2024 - 15:04:01 EST


On Thu, 4 Jan 2024 14:02:46 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> > And that very fact actually makes me wonder:
> >
> > > for (i = 0; i < ei->nr_entries; i++) {
> > > + void *cdata = ei->data;
> > > +
> > > + if (c > 0) {
> > > + c--;
> > > + continue;
> > > + }
> >
> > The 'ei->nr_entries' things are in a stable array, so the indexing for
> > them cannot change (ie even if "is_freed" were to be set the array is
> > still stable).
>
> Yeah, the entries is fixed. If the ei itself gets freed, so does all the
> entries at the same time. The individual entries should too.
>
> Hmm, it probably doesn't even make sense to continue the loop if is_freed
> is set as the same variable will be checked every time. :-/ Should just be
> a "break;" not a "continue;"

Also, I just realized it breaks if we update the 'c--' before the callback. :-/

I have to put this check *after* the callback check.

Reason being, the callback can say "this event doesn't get this file" and
return 0, which tells eventfs to skip this file.

For example, we have

# ls /sys/kernel/tracing/events/ftrace/function
format hist hist_debug id inject

and

# ls /sys/kernel/tracing/events/sched/sched_switch/
enable filter format hist hist_debug id inject trigger

The "ftrace" event files are for information only. They describe the
internal events. For example, the "function" event is what is written into
the ring buffer by the function tracer. That event is not enabled by the
events directory. It is only enabled when "function" is written into
current_tracer.

Same for filtering. The filter logic for function events is done by the
"set_ftrace_filter" file in tracefs.

But the "sched_switch" event is enabled and filtered by the eventfs
directory. Here we see "enable" and "filter" files that the user could
write into to enabled and/or filter the sched_switch event.

Now because the "function" and "sched_switch" registered the same way, the
callback that handles the two has:

static int event_callback(const char *name, umode_t *mode, void **data,
const struct file_operations **fops)
{
struct trace_event_file *file = *data;
struct trace_event_call *call = file->event_call;

[..]
if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
if (call->class->reg && strcmp(name, "enable") == 0) {
*mode = TRACE_MODE_WRITE;
*fops = &ftrace_enable_fops;
return 1;
}

if (strcmp(name, "filter") == 0) {
*mode = TRACE_MODE_WRITE;
*fops = &ftrace_event_filter_fops;
return 1;
}
}
[..]
return 0;
}

Where the function event has that "TRACE_EVENT_FL_IGNORE_ENABLE" flag set,
so when the entry for "enable" or "filter" is passed to it, it returns 0.

Before, where we had c-- after the callback, we had:

# ls /sys/kernel/tracing/events/ftrace/function
format hist hist_debug id inject

After changing the c-- to before the callback I now get:

# ls /sys/kernel/tracing/events/ftrace/function/
format hist hist hist_debug hist_debug id inject inject

Where the missing "enable" and "filter" files caused the indexing to be
off, and the ls repeated "hist" and "hist_debug" because of it. Oh, and the
function event doesn't have "trigger" either which is why we get two
"inject" files.

I have to move the c-- update down. I can still do it before creating the
dentry though, as if that fails, we should just bail.

-- Steve