Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()

From: Linus Torvalds
Date: Wed Jan 03 2024 - 13:12:38 EST


On Wed, 3 Jan 2024 at 07:24, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> Instead, just have eventfs have its own iterate_shared callback function
> that will fill in the dent entries. This simplifies the code quite a bit.

Much better. Now eventfs looks more like a real filesystem, and less
like an eldritch horror monster that is parts of dcache tackled onto a
pseudo-filesystem.

However, one request, and one nit:

> Also, remove the "lookup" parameter to the create_file/dir_dentry() and
> always have it return a dentry that has its ref count incremented, and
> have the caller call the dput. This simplifies that code as well.

Can you please do that as a separate patch, where the first patch just
cleans up the directory iteration, and the second patch then goes "now
there are no more callers that have the 'lookup' argument set to
false".

Because as-is, the patch is kind of two things mixed up.

The small nit is this:

> +static int eventfs_iterate(struct file *file, struct dir_context *ctx)
> {
> + /*
> + * Need to create the dentries and inodes to have a consistent
> + * inode number.
> + */
> list_for_each_entry_srcu(ei_child, &ei->children, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - d = create_dir_dentry(ei, ei_child, parent, false);
> - if (d) {
> - ret = add_dentries(&dentries, d, cnt);
> - if (ret < 0)
> - break;
> - cnt++;
> +
> + if (ei_child->is_freed)
> + continue;
> +
> + name = ei_child->name;
> +
> + dentry = create_dir_dentry(ei, ei_child, ei_dentry);
> + if (!dentry)
> + goto out;
> + ino = dentry->d_inode->i_ino;
> + dput(dentry);
> +
> + if (c > 0) {
> + c--;
> + continue;
> }

Just move this "is the position before this name" up to the top of the
loop. Even above the "is_freed" part.

Let's just always count all the entries in the child list.

And same for the ei->nr_entries loop:

> for (i = 0; i < ei->nr_entries; i++) {

where there's no point in creating that dentry just to look up the
inode number, only to then decide "Oh, we already iterated past this
part, so let's not do anything with it".

This wouldn't seem to matter much with a big enough getdents buffer
(which is the normal user level behavior), but it actually does,
because we don't keep track of "we have read to the end of the
directory".

So every readdir ends up effectively doing getdents at least twice:
once to read the directory entries, and then once to just be told
"that was all".

End result: you should strive very hard to *not* waste time on the
directory entries that have already been read, and are less than
'ctx->pos'.

Linus