Re: [GIT PULL] bcachefs

From: Christian Brauner
Date: Fri Aug 11 2023 - 04:14:07 EST


On Fri, Aug 11, 2023 at 10:10:42AM +0200, Jan Kara wrote:
> On Thu 10-08-23 22:47:03, Kent Overstreet wrote:
> > On Thu, Aug 10, 2023 at 07:52:05PM +0200, Jan Kara wrote:
> > > On Thu 10-08-23 11:54:53, Kent Overstreet wrote:
> > > > > And there clearly is something very strange going on with superblock
> > > > > handling
> > > >
> > > > This deserves an explanation because sget() is a bit nutty.
> > > >
> > > > The way sget() is conventionally used for block device filesystems, the
> > > > block device open _isn't actually exclusive_ - sure, FMODE_EXCL is used,
> > > > but the holder is the fs type pointer, so it won't exclude with other
> > > > opens of the same fs type.
> > > >
> > > > That means the only protection from multiple opens scribbling over each
> > > > other is sget() itself - but if the bdev handle ever outlives the
> > > > superblock we're completely screwed; that's a silent data corruption bug
> > > > that we can't easily catch, and if the filesystem teardown path has any
> > > > asynchronous stuff going on (and of course it does) that's not a hard
> > > > mistake to make. I've observed at least one bug that looked suspiciously
> > > > like that, but I don't think I quite pinned it down at the time.
> > >
> > > This is just being changed - check Christian's VFS tree. There are patches
> > > that make sget() use superblock pointer as a bdev holder so the reuse
> > > you're speaking about isn't a problem anymore.
> >
> > So then the question is what do you use for identifying the superblock,
> > and you're switching to the dev_t - interesting.
> >
> > Are we 100% sure that will never break, that a dev_t will always
> > identify a unique block_device? Namespacing has been changing things.
>
> Yes, dev_t is a unique identifier of the device, we rely on that in
> multiple places, block device open comes to mind as the first. You're
> right namespacing changes things but we implement that as changing what
> gets presented to userspace via some mapping layer while the kernel keeps
> using globally unique identifiers.

Full device namespacing is not on the horizon at all. We've looked into
this years ago and it woud be a giant effort that would effect nearly
everything if the properly. So even if, there would be so many changes
required that reliance on dev_t in the VFS would be the least of our
problems.