Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

From: Amir Goldstein
Date: Wed May 17 2023 - 02:52:03 EST


On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2023/5/2 17:07, Daniel Rosenberg wrote:
> > On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >>
> >>
> >> The security model needs to be thought about and documented. Think
> >> about this: the fuse server now delegates operations it would itself
> >> perform to the passthrough code in fuse. The permissions that would
> >> have been checked in the context of the fuse server are now checked in
> >> the context of the task performing the operation. The server may be
> >> able to bypass seccomp restrictions. Files that are open on the
> >> backing filesystem are now hidden (e.g. lsof won't find these), which
> >> allows the server to obfuscate accesses to backing files. Etc.
> >>
> >> These are not particularly worrying if the server is privileged, but
> >> fuse comes with the history of supporting unprivileged servers, so we
> >> should look at supporting passthrough with unprivileged servers as
> >> well.
> >>
> >
> > This is on my todo list. My current plan is to grab the creds that the
> > daemon uses to respond to FUSE_INIT. That should keep behavior fairly
> > similar. I'm not sure if there are cases where the fuse server is
> > operating under multiple contexts.
> > I don't currently have a plan for exposing open files via lsof. Every
> > such file should relate to one that will show up though. I haven't dug
> > into how that's set up, but I'm open to suggestions.
> >
> >> My other generic comment is that you should add justification for
> >> doing this in the first place. I guess it's mainly performance. So
> >> how performance can be won in real life cases? It would also be good
> >> to measure the contribution of individual ops to that win. Is there
> >> another reason for this besides performance?
> >>
> >> Thanks,
> >> Miklos
> >
> > Our main concern with it is performance. We have some preliminary
> > numbers looking at the pure passthrough case. We've been testing using
> > a ramdrive on a somewhat slow machine, as that should highlight
> > differences more. We ran fio for sequential reads, and random
> > read/write. For sequential reads, we were seeing libfuse's
> > passthrough_hp take about a 50% hit, with fuse-bpf not being
> > detectably slower. For random read/write, we were seeing a roughly 90%
> > drop in performance from passthrough_hp, while fuse-bpf has about a 7%
> > drop in read and write speed. When we use a bpf that traces every
> > opcode, that performance hit increases to a roughly 1% drop in
> > sequential read performance, and a 20% drop in both read and write
> > performance for random read/write. We plan to make more complex bpf
> > examples, with fuse daemon equivalents to compare against.
> >
> > We have not looked closely at the impact of individual opcodes yet.
> >
> > There's also a potential ease of use for fuse-bpf. If you're
> > implementing a fuse daemon that is largely mirroring a backing
> > filesystem, you only need to write code for the differences in
> > behavior. For instance, say you want to remove image metadata like
> > location. You could give bpf information on what range of data is
> > metadata, and zero out that section without having to handle any other
> > operations.
>
> A bit out of topic (although I'm not quite look into FUSE BPF internals)
> After roughly listening to this topic in FS track last week, I'm not
> quite sure (at least in the long term) if it might be better if
> ebpf-related filter/redirect stuffs could be landed in vfs or in a
> somewhat stackable fs so that we could redirect/filter any sub-fstree
> in principle? It's just an open question and I have no real tendency
> of this but do we really need a BPF-filter functionality for each
> individual fs?

I think that is a valid question, but the answer is that even if it makes sense,
doing something like this in vfs would be a much bigger project with larger
consequences on performance and security and whatnot, so even if
(and a very big if) this ever happens, using FUSE-BPF as a playground for
this sort of stuff would be a good idea.

This reminds me of union mounts - it made sense to have union mount
functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
turned out to be a much more practical solution.

>
> It sounds much like
> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>

Nice reference.
I must admit that I found it hard to understand what Windows filter drivers
can do compared to FUSE-BPF design.
It'd be nice to get some comparison from what is planned for FUSE-BPF.

Interesting to note that there is a "legacy" Windows filter driver API,
so Windows didn't get everything right for the first API - that is especially
interesting to look at as repeating other people's mistakes would be a shame.

Thanks,
Amir.