Re: [GIT PULL] pidfd updates

From: Al Viro
Date: Tue Apr 25 2023 - 02:04:39 EST


On Mon, Apr 24, 2023 at 01:24:24PM -0700, Linus Torvalds wrote:

> But I really think a potentially much nicer model would have been to
> extend our "get_unused_fd_flags()" model.
>
> IOW, we could have instead marked the 'struct file *' in the file
> descriptor table as being "not ready yet".
>
> I wonder how nasty it would have been to have the low bit of the
> 'struct file *' mark "not ready to be used yet" or something similar.
> You already can't just access the 'fdt->fd[]' array willy-nilly since
> we have both normal RCU issues _and_ the somewhat unusual spectre
> array indexing issues.
>
> So looking around with
>
> git grep -e '->fd\['
>
> we seem to be pretty good about that and it probably wouldn't be too
> horrid to add a "check low bit isn't set" to the rules.
>
> Then pidfd_prepare() could actually install the file pointer in the fd
> table, just marked as "not ready", and then instead of "fd_install()",
> yuo'd have "fd_expose(fd)" or something.
>
> I dislike interfaces that return two different things. Particularly
> ones that are supposed to be there to make things easy for the user. I
> think your pidfd_prepare() helper fails that "make it easy to use"
> test.
>
> Hmm?

I'm not fond of "return two things" kind of helpers, but I'm even less
fond of "return fd, file is already there" ones, TBH. {__,}pidfd_prepare()
users are thankfully very limited in the things they do to the file that
had been returned, but that really invites abuse.

The deeper in call chain we mess with descriptor table, the more painful it
gets, IME.

Speaking of {__,}pidfd_prepare(), I wonder if we wouldn't be better off
with get_unused_fd_flags() lifted into the callers - all three of those
(fanotify copy_event_to_user(), copy_process() and pidfd_create()).
Switch from anon_inode_getfd() to anon_inode_getfile() certainly
made sense, ditto for combining it with get_pid(), but mixing
get_unused_fd_flags() into that is a mistake, IMO.

As for your suggestion... let's see what it leads to.

Suppose we add such entries (reserved, hold a reference to file,
marked "not yet available" in the LSB). From the current tree POV those
would be equivalent to descriptor already reserved, but fd_install() not
done. So behaviour of existing primitives should be the same as for this
situation, except for fd_install() and put_unused_fd().

* pick_file(), __fget_files_rcu(), iterate_fd(), files_lookup_fd_raw(),
loop in dup_fd(), io_close() - treat odd pointers as NULL.
* close_files() should, AFAICS, treat an odd pointer as "should never
happen" (and that xchg() in there needs to go anyway - it's pointless, since
we are freeing the the array immediately afterwards.
* do_close_on_exec() should probably treat them as "should never happen".
* do_dup2() - odd value should be treated as -EBUSY.

The interesting part, of course, is how to legitimize (or dispose of) such
a beast. The former is your "fd_expose()" - parallel to fd_install(),
AFAICS. The latter... another primitive that would
grab ->files_lock
pick_file() variant that *expects* an odd value
drop ->files_lock
clear LSB and pass to fput().

It's doable, but AFAICS doesn't make callers all that happier...