Re: [PATCH v3 3/6] ext4: Online defrag not supported with DAX

From: Dave Chinner
Date: Wed Feb 17 2016 - 19:12:32 EST


On Wed, Feb 17, 2016 at 02:50:37PM -0700, Ross Zwisler wrote:
> On Tue, Feb 16, 2016 at 08:34:16PM -0700, Ross Zwisler wrote:
> > Online defrag operations for ext4 are hard coded to use the page cache.
> > See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()
> >
> > When combined with DAX I/O, which circumvents the page cache, this can
> > result in data corruption. This was observed with xfstests ext4/307 and
> > ext4/308.
> >
> > Fix this by only allowing online defrag for non-DAX files.
>
> Jan,
>
> Thinking about this a bit more, it's probably the case that the data
> corruption I was observing was due to us skipping the writeback of the dirty
> page cache pages because S_DAX was set.
>
> I do think we have a problem with defrag because it is doing the extent
> swapping using the page cache, and we won't flush the dirty pages due to
> S_DAX being set.
>
> This patch is the quick and easy answer, and is perhaps appropriate for v4.5.
>
> Looking forward, though, what do you think the correct solution is? Making an
> extent swapper that doesn't use the page cache (as I believe XFS has? see
> xfs_swap_extents()),

XFS does the data copy in userspace using direct IO so we don't
care about whether DAX is enabled or not on either the source or
destination inode. i.e. xfs_swap_extents() is a pure
metadata operation, swapping the entire extent tree between two
inodes if the source data has not changed while the copy was in
progress.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx