Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

From: Dave Chinner
Date: Tue Jun 20 2017 - 22:21:04 EST


On Tue, Jun 20, 2017 at 06:24:03PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 21, 2017 at 09:53:46AM +1000, Dave Chinner wrote:
> > On Tue, Jun 20, 2017 at 09:17:36AM -0700, Dan Williams wrote:
> > > An immutable-extent DAX-file and a reflink-capable DAX-file are not
> > > mutually exclusive,
> >
> > Actually, they are mutually exclusive: when the immutable extent DAX
> > inode is breaking the extent sharing done during the reflink
> > operation, the copy-on-write operation requires allocating and
> > freeing extents on the inode that has immutable extents. Which, if
> > the inode really has immutable extents, cannot be done.
> >
> > That said, if the extent sharing is broken on the other side of the
> > reflink (i.e. the non-immutable inode created by the reflink) then
> > the extent map of the inode with immutable extents will remain
> > unchanged. i.e. there are two sides to this, and if you only see one
> > side you might come to the wrong conclusion.
> >
> > However, we cannot guarantee that no writes occur to the inode with
> > immutable extent maps (especially as the whole point is to allow
> > userspace writes and commits without the kernel being involved), so
> > extent sharing on immutable extent maps cannot be allowed...
>
> Just to play devil's advocate...
>
> /If/ you have rmap and /if/ you discover that there's only one
> IOMAP_IMMUTABLE file owning this same block and /if/ you're willing to
> relocate every other mapping on the whole filesystem, /then/ you could
> /in theory/ support shared daxfiles.

I figured that nobody apart from experienced filesystem developers
would understand the complexities of rmap and refcounts and how they
could be abused to do this. I also assumed that that people like you
would understand this is possible but completely impractical....

> However, that's so many on-disk metadata lookups to shove into a
> pagefault handler that I don't think anyone in XFSland would entertain
> such an ugly fantasy. You'd be making a lot of metadata requests, and
> you'd have to lock the rmapbt while grabbing inodes, which is insane.

Exactly. But while I understand this, consider the amount of assumed
filesystem and XFS knowledge in that one simple paragraph. Most
non-experts would have stopped *understanding* at "/If/ you have
rmap" and go away with the wrong ideas in their heads. Hence I now
tend to omit mentioning "possible but impractical" things in mixed
expertise discussions....

> Much easier to have a per-inode flag that says "the block map of this
> file does not change" and put up with the restricted semantics.

In a nutshell.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx