Re: [PATCH 0/3] Page based O_DIRECT and O_DIRECT loop

From: Jens Axboe
Date: Tue Aug 18 2009 - 02:42:11 EST


On Mon, Aug 17 2009, Christoph Hellwig wrote:
> On Mon, Aug 17, 2009 at 12:34:31PM +0200, Jens Axboe wrote:
> > Hi,
> >
> > Currently it's not feasible to use loop for O_DIRECT workloads
> > that expect some sort of data integrity, since loop always
> > uses page cache IO. Some time ago, I posted a variant of loop
> > that used remapping to function like a proper disk, but that patch
> > was a bit fragile in that it relied loop maintaining a fs block
> > remapping tree. This time I wanted to take a different approach.
> >
> > The first two patches in this series convert the O_DIRECT IO path
> > to be page based instead of passing down the iovecs. This is
> > basically a first version so don't expect too much of it, but it
> > does seem to work fine for me. Most O_DIRECT users were one-liner
> > conversions, NFS required a bit more effort (and that effort has, btw,
> > not been tested at all yet). At least the diffstat for the core bits
> > don't look too bad:
>
> Nice! I took a quick look and here are some superficial comments:
>
> - right nbow this loveses all the benefits of using preadv/pwritev.
> Qemu/KVM will not be happy about this. We need some way to submit
> each vector asynchronously first and then only wait for all of them
> to complete.

Agree. In general the O_DIRECT bits need a looking at from the plugging
perspective.

> - do_dio is a rather odd name. What about resurrecting
> generic_file_direct_IO?

It is probably too weird. I'll change it.

> - it would be great if we could kill dio_args.user_addr and move
> everything that deals with it to do_dio/generic_file_direct_IO.
> Given that only look at it in __blockdev_direct_IO and
> direct_io_worker beforew we start the real work that sounds doable
> relatively easily. The only issue might be NFS.
> After this all the bits that deal with user addresses could live
> in filemap.c and keep this totally out of direct-io.c

I'll take a look at this, but may defer this to a later patch.

> - why is the rw argument no part of struct dio_args? IMHO it
> should move in there.

Dunno, it may as well go in there. Will fix that.

> - patch 1 should probably be split further into a first patch just
> introducing struct dio_args, and then doing the heavy lifting without
> touching all the filesystems.

OK, so a dio_args with the current arguments, then a switch over to the
page based stuff. That would probably be cleaner, I'll split it up.

> Also this stuff will massively catch with my patch to sort out the
> locking mess in __blockdev_direct_IO, you might consider working ontop
> of that.

I did notice that. I'll work off mainline for the next version, then
I'll take the pain of merging on top of your locking rewrite next.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/