Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io

From: Dan Williams
Date: Thu May 05 2016 - 11:15:39 EST


On Thu, May 5, 2016 at 7:24 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Mon, May 02, 2016 at 06:41:51PM +0300, Boaz Harrosh wrote:
>> > All IO in a dax filesystem used to go through dax_do_io, which cannot
>> > handle media errors, and thus cannot provide a recovery path that can
>> > send a write through the driver to clear errors.
>> >
>> > Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO
>> > path for DAX filesystems, use the same direct_IO path for both DAX and
>> > direct_io iocbs, but use the flags to identify when we are in O_DIRECT
>> > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional
>> > direct_IO path instead of DAX.
>> >
>>
>> Really? What are your thinking here?
>>
>> What about all the current users of O_DIRECT, you have just made them
>> 4 times slower and "less concurrent*" then "buffred io" users. Since
>> direct_IO path will queue an IO request and all.
>> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical])
>>
>> I hate it that you overload the semantics of a known and expected
>> O_DIRECT flag, for special pmem quirks. This is an incompatible
>> and unrelated overload of the semantics of O_DIRECT.
>
> Agreed - makig O_DIRECT less direct than not having it is plain stupid,
> and I somehow missed this initially.

Of course I disagree because like Dave argues in the msync case we
should do the correct thing first and make it fast later, but also
like Dave this arguing in circles is getting tiresome.

> This whole DAX story turns into a major nightmare, and I fear all our
> hodge podge tweaks to the semantics aren't helping it.
>
> It seems like we simply need an explicit O_DAX for the read/write
> bypass if can't sort out the semantics (error, writer synchronization)
> just as we need a special flag for MMAP.

I don't see how O_DAX makes this situation better if the goal is to
accelerate unmodified applications...

Vishal, at least the "delete a file with a badblock" model will still
work for implicitly clearing errors with your changes to stop doing
block clearing in fs/dax.c. This combined with a new -EBADBLOCK (as
Dave suggests) and explicit logging of I/Os that fail for this reason
at least gives a chance to communicate errors in files to suitably
aware applications / environments.