Re: disk IO directly from PCI memory to block device sectors

From: Jens Axboe
Date: Fri Sep 26 2008 - 06:20:22 EST


On Fri, Sep 26 2008, Alan Cox wrote:
> On Fri, 26 Sep 2008 11:11:35 +0200
> Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
>
> > On Fri, Sep 26 2008, Alan Cox wrote:
> > > > What I'm looking is for a more generic/driver independent way of sticking
> > > > contents of PCI ram onto a disk.
> > >
> > > Ermm seriously why not have a userspace task with the PCI RAM mmapped
> > > and just use write() like normal sane people do ?
> >
> > To avoid the fault and copy, I would assume.
>
> It's a write to a raw partition so with O_DIRECT you won't have to copy
> and MAP_POPULATE will premap the object if even the first write wants to
> occur without faulting overhead.

You are still going through get_user_pages() for each write. As I would
imagine the writes would generally be large, the hit would not be too
bad (but it's still there).

Depending on the hardware, it may or may not be a big deal. But the path
from device to disk is definitely a lot bigger and more complex with the
mmap/write approach.

Another alternative would be using splice - if the pci device exposed a
char device node, you could support ->splice_read() there which would
just fill the pages into the pipe buffer. Then change the block device
fops ->splice_write() to go direct to the block device through a bio
instead of using the page cache based generic_file_splice_write(). Such
a change would actually make sense to do, if the block device has been
opened with O_DIRECT. And it would get you about the same performance as
doing it in-kernel, the only extra overhead would be two syscalls per
64k (well probably only one extra syscall, since you probably need an
ioctl/syscall to initiate the in-kernel activity as well). So just about
as free as you could get.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/