Re: [PATCH 1/3] nfsd: use __fput_sync() to avoid delayed closing of files.

From: NeilBrown
Date: Sun Dec 10 2023 - 17:59:38 EST


On Sat, 09 Dec 2023, Chuck Lever wrote:
> On Fri, Dec 08, 2023 at 02:27:26PM +1100, NeilBrown wrote:
> > Calling fput() directly or though filp_close() from a kernel thread like
> > nfsd causes the final __fput() (if necessary) to be called from a
> > workqueue. This means that nfsd is not forced to wait for any work to
> > complete. If the ->release of ->destroy_inode function is slow for any
> > reason, this can result in nfsd closing files more quickly than the
> > workqueue can complete the close and the queue of pending closes can
> > grow without bounces (30 million has been seen at one customer site,
> > though this was in part due to a slowness in xfs which has since been
> > fixed).
> >
> > nfsd does not need this.
>
> That is technically true, but IIUC, there is only one case where a
> synchronous close matters for the backlog problem, and that's when
> nfsd_file_free() is called from nfsd_file_put(). AFAICT all other
> call sites (except rename) are error paths, so there aren't negative
> consequences for the lack of synchronous wait there...

What you say is technically true but it isn't the way I see it.

Firstly I should clarify that __fput_sync() is *not* a flushing close as
you describe it below.
All it does, apart for some trivial book-keeping, is to call ->release
and possibly ->destroy_inode immediately rather than shunting them off
to another thread.
Apparently ->release sometimes does something that can deadlock with
some kernel threads or if some awkward locks are held, so the whole
final __fput is delay by default. But this does not apply to nfsd.
Standard fput() is really the wrong interface for nfsd to use.
It should use __fput_sync() (which shouldn't have such a scary name).

The comment above flush_delayed_fput() seems to suggest that unmounting
is a core issue. Maybe the fact that __fput() can call
dissolve_on_fput() is a reason why it is sometimes safer to leave the
work to later. But I don't see that applying to nfsd.

Of course a ->release function *could* do synchronous writes just like
the XFS ->destroy_inode function used to do synchronous reads.
I don't think we should ever try to hide that by putting it in
a workqueue. It's probably a bug and it is best if bugs are visible.

Note that the XFS ->release function does call filemap_flush() in some
cases, but that is an async flush, so __fput_sync doesn't wait for the
flush to complete.

The way I see this patch is that fput() is the wrong interface for nfsd
to use, __fput_sync is the right interface. So we should change. 1
patch.
The details about exhausting memory explain a particular symptom that
motivated the examination which revealed that nfsd was using the wrong
interface.

If we have nfsd sometimes using fput() and sometimes __fput_sync, then
we need to have clear rules for when to use which. It is much easier to
have a simple rule: always use __fput_sync().

I'm certainly happy to revise function documentation and provide
wrapper functions if needed.

I might be good to have

void filp_close_sync(struct file *f)
{
get_file(f);
filp_close(f);
__fput_sync(f);
}

but as that would only be called once, it was hard to motivate.
Having it in linux/fs.h would be nice.

Similarly would could wrap __fput_sync() is a more friendly name, but
that would be better if we actually renamed the function.

void fput_now(struct file *f)
{
__fput_sync(f);
}

??

Thanks,
NeilBrown