Re: Nonblocking buffered AIO from userspace

From: Jan Kara
Date: Mon May 27 2013 - 16:47:20 EST


Hello,

On Thu 23-05-13 16:49:49, Milosz Tanski wrote:
> I need some advice on the best way to accomplish non-blocking buffered
> disk IO from my user space application. Unlike some of the other
> database systems I'm trying to outsource as much work to the kernel as
> possible. I would prefer to avoid having to resolve to O_DIRECT and
> io_submit to fetch the data and having to reimplement the page /
> buffer cache & read ahead.
>
> The application is read heavy with occasional long running write jobs.
> Since I'm not too concerned about the performance on the write path I
> am able to run that work in threads and block.
>
> Current I'm mmaping the files, and the make the read path quite simple
> and is great for disk scans when my data set is stored in memory. When
> the data is not cached the performance becomes more unpredictable,
> esp. when I'm doing an indexed read (giant bitmap indexes). Here's how
> my IO path looks like:
>
> application <--> fscache (SSD) <--> cephfs <--> ceph cluster
>
> Ultimately what I'd like is a way to do non-blocking scatter gather IO
> from disk or page cache into my application. I'd like to be
> non-blocking because it often happens that I can do something useful
> while waiting on IO like uncompress indexes for another request that
> is waiting, process network IO., etc.
>
> With mmap my blocking is unpredictable and mlock() blocks and only
> lets me lock a range and not a vector of page ranges.
Maybe the API you are looking for is madvise(MADV_WILLNEED)? That forces
asynchronous readahead for the specified range.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/