Re: [PATCH 5/5] block: enable dax for raw block devices

From: Jan Kara
Date: Mon Oct 26 2015 - 03:21:11 EST


On Mon 26-10-15 17:23:19, Dave Chinner wrote:
> On Mon, Oct 26, 2015 at 11:48:06AM +0900, Dan Williams wrote:
> > 2/ Even if we get a new flag that lets the kernel know the app
> > understands DAX mappings, we shouldn't leave fsync broken. Can we
> > instead get by with a simple / big hammer solution? I.e.
>
> Because we don't physically have to write back data the problem is
> both simpler and more complex. The simplest solution is for the
> underlying block device to implement blkdev_issue_flush() correctly.
>
> i.e. if blkdev_issue_flush() behaves according to it's required
> semantics - that all volatile cached data is flushed to stable
> storage - then fsync-on-DAX will work appropriately. As it is, this is
> needed for journal based filesystems to work correctly, as they are
> assuming that their journal writes are being treated correctly as
> REQ_FLUSH | REQ_FUA to ensure correct data/metadata/journal
> ordering is maintained....
>
> So, to begin with, this problem needs to be solved at the block
> device level. That's the simple, brute-force, big hammer solution to
> the problem, and it requires no changes at the filesystem level at
> all.

Completely agreed. Just make sure REQ_FLUSH, REQ_FUA works correctly for
pmem and fsync(2) / sync(2) issues go away. Fs freezing stuff is a
different story, that will likely need some coordination from the
filesystem layer (although with some luck we could keep it hidden in
fs/super.c and fs/block_dev.c). I can have a look at that once ext4 dax
support works unless someone beats me to it...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/