Re: [PATCH v2] vfs: shave work on failed file open

From: Mateusz Guzik
Date: Fri Sep 29 2023 - 17:23:11 EST


On 9/29/23, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> On Fri, Sep 29, 2023 at 03:31:29PM +0200, Jann Horn wrote:
>> On Fri, Sep 29, 2023 at 11:20 AM Christian Brauner <brauner@xxxxxxxxxx>
>> wrote:
>> > > But yes, that protection would be broken by SLAB_TYPESAFE_BY_RCU,
>> > > since then the "f_count is zero" is no longer a final thing.
>> >
>> > I've tried coming up with a patch that is simple enough so the pattern
>> > is easy to follow and then converting all places to rely on a pattern
>> > that combine lookup_fd_rcu() or similar with get_file_rcu(). The
>> > obvious
>> > thing is that we'll force a few places to now always acquire a
>> > reference
>> > when they don't really need one right now and that already may cause
>> > performance issues.
>>
>> (Those places are probably used way less often than the hot
>> open/fget/close paths though.)
>>
>> > We also can't fully get rid of plain get_file_rcu() uses itself because
>> > of users such as mm->exe_file. They don't go from one of the rcu
>> > fdtable
>> > lookup helpers to the struct file obviously. They rcu replace the file
>> > pointer in their struct ofc so we could change get_file_rcu() to take a
>> > struct file __rcu **f and then comparing that the passed in pointer
>> > hasn't changed before we managed to do atomic_long_inc_not_zero().
>> > Which
>> > afaict should work for such cases.
>> >
>> > But overall we would introduce a fairly big and at the same time subtle
>> > semantic change. The idea is pretty neat and it was fun to do but I'm
>> > just not convinced we should do it given how ubiquitous struct file is
>> > used and now to make the semanics even more special by allowing
>> > refcounts.
>> >
>> > I've kept your original release_empty_file() proposal in vfs.misc which
>> > I think is a really nice change.
>> >
>> > Let me know if you all passionately disagree. ;)
>
> So I'm appending the patch I had played with and a fix from Jann on top.
> @Linus, if you have an opinion, let me know what you think.
>
> Also available here:
> https://gitlab.com/brauner/linux/-/commits/vfs.file.rcu
>
> Might be interesting if this could be perfed to see if there is any real
> gain for workloads with massive numbers of fds.
>

I would feel safer with a guaranteed way to tell that the file was reallocated.

I think this could track allocs/frees with a sequence counter embedded
into the object, say odd means deallocated and even means allocated.

Then you would know for a fact whether you raced with the file getting
whacked and would never have to wonder if you double-checked
everything you needed (like that f_mode) thing.

This would also mean that consumers which get away with poking around
the file without getting a ref could still do it, this is at least
true for tid_fd_mode. All of them would need patching though.

Extending struct file is not ideal by any means, but the good news is that:
1. there is a 4 byte hole in there, if one is fine with an int-sized counter
2. if one insists on 8 bytes, the struct is 232 bytes on my kernel
(debian). still some room up to 256, so it may be tolerable?

--
Mateusz Guzik <mjguzik gmail.com>