Re: [PATCH v2] vfs: shave work on failed file open

From: Linus Torvalds
Date: Wed Sep 27 2023 - 16:28:17 EST


On Wed, 27 Sept 2023 at 11:32, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> put_cred showed up in file_free_rcu in d76b0d9b2d87 ("CRED: Use creds in
> file structs"). Commit message does not claim any dependency on this
> being in an rcu callback already and it looks like it was done this way
> because this was the ony spot with kmem_cache_free(filp_cachep, f)

Yes, that looks about right. So the rcu-freeing is almost an accident.

Btw, I think we could get rid of the RCU freeing of 'struct file *' entirely.

The way to fix it is

(a) make sure all f_count accesses are atomic ops (the one special
case is the "0 -> X" initialization, which is ok)

(b) make filp_cachep be SLAB_TYPESAFE_BY_RCU

because then get_file_rcu() can do the atomic_long_inc_not_zero()
knowing it's still a 'struct file *' while holding the RCU read lock
even if it was just free'd.

And __fget_files_rcu() will then re-check that it's the *right*
'struct file *' and do a fput() on it and re-try if it isn't. End
result: no need for any RCU freeing.

But the difference is that a *new* 'struct file *' might see a
temporary atomic increment / decrement of the file pointer because
another CPU is going through that __fget_files_rcu() dance.

Which is why "0 -> X" is ok to do as a "atomic_long_set()", but
everything else would need to be done as "atomic_long_inc()" etc.

Which all seems to be the case already, so with the put_cred() not
needing the RCU delay, I thing we really could do this patch (note:
independent of other issues, but makes your patch require that
"atomic_long_cmpxchg()" and the WARN_ON() should probably go away,
because it can actually happen).

That should help the normal file open/close case a bit, in that it
doesn't cause that extra RCU work.

Of course, on some loads it might be advantageous to do a delayed
de-allocation in some other RCU context, so ..

What do you think?

Linus

PS. And as always: ENTIRELY UNTESTED.
fs/file_table.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ee21b3da9d08..7b38ff7385cc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -65,21 +65,21 @@ static void file_free_rcu(struct rcu_head *head)
{
struct file *f = container_of(head, struct file, f_rcuhead);

- put_cred(f->f_cred);
- if (unlikely(f->f_mode & FMODE_BACKING))
- kfree(backing_file(f));
- else
- kmem_cache_free(filp_cachep, f);
+ kfree(backing_file(f));
}

static inline void file_free(struct file *f)
{
security_file_free(f);
- if (unlikely(f->f_mode & FMODE_BACKING))
- path_put(backing_file_real_path(f));
if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
percpu_counter_dec(&nr_files);
- call_rcu(&f->f_rcuhead, file_free_rcu);
+ put_cred(f->f_cred);
+ if (unlikely(f->f_mode & FMODE_BACKING)) {
+ path_put(backing_file_real_path(f));
+ call_rcu(&f->f_rcuhead, file_free_rcu);
+ } else {
+ kmem_cache_free(filp_cachep, f);
+ }
}

/*
@@ -471,7 +471,8 @@ EXPORT_SYMBOL(__fput_sync);
void __init files_init(void)
{
filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
- SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
+ SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN
+ | SLAB_PANIC | SLAB_ACCOUNT, NULL);
percpu_counter_init(&nr_files, 0, GFP_KERNEL);
}