Re: [PATCH] fs: use __fput_sync in close(2)

From: Linus Torvalds
Date: Tue Aug 08 2023 - 14:48:43 EST


On Mon, 7 Aug 2023 at 22:57, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> Taking a quick look at the history it appears that fput was always
> synchronous [..]

Indeed. Synchronous used to be the only case.

The reason it's async now is because several drivers etc do the final
close from nasty contexts, so 'fput()' needed to be async for the
general case.

> All 3 issues taken together says that a synchronous fput is a
> loaded foot gun that must be used very carefully. That said
> close(2) does seem to be a reliably safe place to be synchronous.

Yes.

That said, I detest Mateusz' patch. I hate these kinds of "do
different things based on flags" interfaces. Particularly when it
spreads out like this.

So I do like having close() be synchronous, because we actually do
have correctness issues wrt the close having completed properly by the
time we return to user space, so we have that "task_work_add()" there
that will force the synchronization anyway before we return.

So the system call case is indeed a special case. Arguably
close_range() could be too, but honestly, once you start doing ranges
of file descriptors, you are (a) doint something fairly unusual, and
(b) the "queue them up on the task work" might actually be a *good*
thing.

It's definitely not a good thing for the single-fd-close case, though.

But even if we want to do this - and I have absolutely no objections
to it conceptually as per above - we need to be a lot more surgical
about it, and not pass stupid flags around.

Here's a TOTALLY UNTESTED(!) patch that I think effectively does what
Mateusz wants done, but does it all within just fs/open.c and only for
the obvious context of the close() system call itself.

All it needs is to just split out the "flush" part from filp_close(),
and we already had all the other infrastructure for this operation.

Mateusz, two questions:

(a) does this patch work for you?

(b) do you have numbers for this all?

and if it all looks good I have no problems with this kind of much
more targeted and obvious patch.

Again: TOTALLY UNTESTED. It looks completely obvious, but mistakes happen.

Linus
fs/open.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e6ead0f19964..5fe427e8685c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1503,7 +1503,7 @@ SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode)
* "id" is the POSIX thread ID. We use the
* files pointer for this..
*/
-int filp_close(struct file *filp, fl_owner_t id)
+static int filp_flush(struct file *filp, fl_owner_t id)
{
int retval = 0;

@@ -1520,10 +1520,15 @@ int filp_close(struct file *filp, fl_owner_t id)
dnotify_flush(filp, id);
locks_remove_posix(filp, id);
}
- fput(filp);
return retval;
}

+int filp_close(struct file *filp, fl_owner_t id)
+{
+ int retval = filp_flush(filp, id);
+ fput(filp);
+ return retval;
+}
EXPORT_SYMBOL(filp_close);

/*
@@ -1533,7 +1538,18 @@ EXPORT_SYMBOL(filp_close);
*/
SYSCALL_DEFINE1(close, unsigned int, fd)
{
- int retval = close_fd(fd);
+ struct file *file = close_fd_get_file(fd);
+ int retval;
+
+ if (!file)
+ return -EBADF;
+ retval = filp_flush(file, current->files);
+
+ /*
+ * We're returning to user space. Don't bother
+ * with any delayed fput() cases.
+ */
+ __fput_sync(file);

/* can't restart close syscall because file table entry was cleared */
if (unlikely(retval == -ERESTARTSYS ||