Re: [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension for FUSE

From: Amir Goldstein
Date: Tue May 02 2023 - 23:45:41 EST


On Tue, May 2, 2023 at 6:38 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Apr 17, 2023 at 06:40:08PM -0700, Daniel Rosenberg wrote:
> > Fuse-bpf provides a short circuit path for Fuse implementations that act
> > as a stacked filesystem. For cases that are directly unchanged,
> > operations are passed directly to the backing filesystem. Small
> > adjustments can be handled by bpf prefilters or postfilters, with the
> > option to fall back to userspace as needed.
>
> Here is my understanding of fuse-bpf design:
> - bpf progs can mostly read-only access fuse_args before and after proper vfs
> operation on a backing path/file/inode.
> - args are unconditionally prepared for bpf prog consumption, but progs won't
> be doing anything with them most of the time.
> - progs unfortunately cannot do any real work. they're nothing but simple filters.
> They can give 'green light' for a fuse_FOO op to be delegated to proper vfs_FOO
> in backing file. The logic in this patch keeps track of backing_path/file/inode.
> - in other words bpf side is "dumb", but it's telling kernel what to do with
> real things like path/file/inode and the kernel is doing real work and calling vfs_*.
>
> This design adds non-negligible overhead to fuse when CONFIG_FUSE_BPF is set.
> Comparing to trip to user space it's close to zero, but the cost of
> initialize_in/out + backing + finalize is not free.
> The patch 33 is especially odd.
> fuse has a traditional mechanism to upcall to user space with fuse_simple_request.
> The patch 33 allows bpf prog to return special return value and trigger two more
> fuse_bpf_simple_request-s to user space. Not clear why.
> It seems to me that the main assumption of the fuse bpf design is that bpf prog
> has to stay short and simple. It cannot do much other than reading and comparing
> strings with the help of dynptr.
> How about we allow bpf attach to fuse_simple_request and nothing else?
> All fuse ops call it anyway and cmd is already encoded in the args.
> Then let bpf prog read fuse_args as-is (without converting them to bpf_fuse_args)
> and avoid doing actual fuse_req to user space.
> Also allow bpf prog acquire and remember path/file/inode.
> The verifier is already smart enough to track that the prog is doing it safely
> without leaking references and what not.
> And, of course, allow bpf prog call vfs_* via kfuncs.
> In other words, instead of hard coding
> +#define bpf_fuse_backing(inode, io, out, \
> + initialize_in, initialize_out, \
> + backing, finalize, args...) \
> one for each fuse_ops in the kernel let bpf prog do the same but on demand.
> The biggest advantage is that this patch set instead of 95% on fuse side and 5% on bpf
> will become 5% addition to fuse code. All the logic will be handled purely by bpf.
> Right now you're limiting it to one backing_file per fuse_file.
> With bpf prog driving it the prog can keep multiple backing_files and shuffle
> access to them as prog decides.
> Instead of doing 'return BPF_FUSE_CONTINUE' the bpf progs will
> pass 'path' to kfunc bpf_vfs_open, than stash 'struct bpf_file*', etc.
> Probably will be easier to white board this idea during lsfmmbpf.
>

I have to admit that sounds a bit challenging, but I'm up for sitting
in front of that whiteboard :)

BTW, thanks Daniel (Borkmann) for sorting out the cross track
sessions for FS-BFP.
We have another FS only session on FUSE-BFP, but I feel there is plenty
to discuss on the FUSE-bypass part, as well as on the BPF part.
Same goes for BFP iterators for filesystems session.

Thanks,
Amir.