Re: [PATCH v2] fuse: In fuse_flush only wait if someone wants the return code

From: Miklos Szeredi
Date: Fri Sep 30 2022 - 09:35:43 EST


On Thu, 29 Sept 2022 at 18:40, Tycho Andersen <tycho@tycho.pizza> wrote:
>
> If a fuse filesystem is mounted inside a container, there is a problem
> during pid namespace destruction. The scenario is:
>
> 1. task (a thread in the fuse server, with a fuse file open) starts
> exiting, does exit_signals(), goes into fuse_flush() -> wait

Can't the same happen through

fuse_flush -> fuse_sync_writes -> fuse_set_nowrite -> wait

?


> 2. fuse daemon gets killed, tries to wake everyone up
> 3. task from 1 is stuck because complete_signal() doesn't wake it up, since
> it has PF_EXITING.
>
> The result is that the thread will never be woken up, and pid namespace
> destruction will block indefinitely.
>
> To add insult to injury, nobody is waiting for these return codes, since
> the pid namespace is being destroyed.
>
> To fix this, let's not block on flush operations when the current task has
> PF_EXITING.
>
> This does change the semantics slightly: the wait here is for posix locks
> to be unlocked, so the task will exit before things are unlocked. To quote
> Miklos: https://lore.kernel.org/all/CAJfpegsTmiO-sKaBLgoVT4WxDXBkRES=HF1YmQN1ES7gfJEJ+w@xxxxxxxxxxxxxx/
>
> > "remote" posix locks are almost never used due to problems like this,
> > so I think it's safe to do this.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
> Link: https://lore.kernel.org/all/YrShFXRLtRt6T%2Fj+@risky/
> ---
> v2: drop the fuse_flush_async() function and just re-use the already
> prepared args; add a description of the problem+note about posix locks
> ---
> fs/fuse/file.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 05caa2b9272e..20bbe3e1afc7 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -464,6 +464,34 @@ static void fuse_sync_writes(struct inode *inode)
> fuse_release_nowrite(inode);
> }
>
> +struct fuse_flush_args {
> + struct fuse_args args;
> + struct fuse_flush_in inarg;
> + struct inode *inode;
> + struct fuse_file *ff;
> +};
> +
> +static void fuse_flush_end(struct fuse_mount *fm, struct fuse_args *args, int err)
> +{
> + struct fuse_flush_args *fa = container_of(args, typeof(*fa), args);
> +
> + if (err == -ENOSYS) {
> + fm->fc->no_flush = 1;
> + err = 0;
> + }
> +
> + /*
> + * In memory i_blocks is not maintained by fuse, if writeback cache is
> + * enabled, i_blocks from cached attr may not be accurate.
> + */
> + if (!err && fm->fc->writeback_cache)
> + fuse_invalidate_attr_mask(fa->inode, STATX_BLOCKS);

This is still duplicating code, can you please create a helper?

Thanks,
Miklos