Re: [patch rfc] towards supporting O_NONBLOCK on regular files

From: Jeff Moyer
Date: Fri Oct 15 2004 - 10:50:15 EST


==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Jeff Moyer <jmoyer@xxxxxxxxxx> adds:

==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; "Stephen C. Tweedie" <sct@xxxxxxxxxx> adds:
sct> Hi, On Mon, 2004-10-11 at 19:58, Jeff Moyer wrote:

sct> I think it's worth getting this right in the long term, though.
sct> Getting readahead of indirect blocks right has other benefits too ---
sct> eg. we may be able to fix the situation where we end up trying to read
sct> indirect blocks before we've even submitted the IO for the previous
sct> data blocks, breaking the IO pipeline ordering.
>>> So for the short term, are you an advocate of the patch posted?

sct> In the short term, can't we just disable readahead for O_NONBLOCK?
sct> That has true non-blocking semantics --- if the data is already
sct> available we return it, but if not, it's up to somebody else to
sct> retrieve it.

sct> That's exactly what you want if you're genuinely trying to avoid
sct> blocking at all costs on a really hot event loop, and the semantics
sct> seem to make sense to me. It's not that different from the networking
sct> case where no amount of read() on a non-blocking fd will get you more
sct> data unless there's another process somewhere filling the stream.

jmoyer> Yes, that sounds like a fine idea. Here is a patch which does
jmoyer> this. Andrew, I know you only want bug fixes, but I'd like to get
jmoyer> this into your queue for post 2.6.9, if possible.

I got the partial read case wrong in the last patch. In fact, it looks
like this code path would perform infinite retries before. This should
address that by returning upon the first partial read. Attached is a new
version of the patch.

-Jeff

--- linux-2.6.9-rc4-mm1/mm/filemap.c.orig 2004-10-15 10:33:24.986209880 -0400
+++ linux-2.6.9-rc4-mm1/mm/filemap.c 2004-10-15 11:38:50.869384920 -0400
@@ -692,7 +692,7 @@ void do_generic_mapping_read(struct addr
unsigned long index, end_index, offset;
loff_t isize;
struct page *cached_page;
- int error;
+ int error, nonblock = filp->f_flags & O_NONBLOCK;
struct file_ra_state ra = *_ra;

cached_page = NULL;
@@ -721,16 +721,27 @@ void do_generic_mapping_read(struct addr
nr = nr - offset;

cond_resched();
- page_cache_readahead(mapping, &ra, filp, index);
+ if (!nonblock)
+ page_cache_readahead(mapping, &ra, filp, index);

find_page:
page = find_get_page(mapping, index);
if (unlikely(page == NULL)) {
+ if (nonblock) {
+ desc->error = -EWOULDBLOCK;
+ break;
+ }
handle_ra_miss(mapping, &ra, index);
goto no_cached_page;
}
- if (!PageUptodate(page))
+ if (!PageUptodate(page)) {
+ if (nonblock) {
+ page_cache_release(page);
+ desc->error = -EWOULDBLOCK;
+ break;
+ }
goto page_not_up_to_date;
+ }
page_ok:

/* If users can be writing to this page using arbitrary
@@ -976,10 +987,10 @@ __generic_file_aio_read(struct kiocb *io
desc.error = 0;
do_generic_file_read(filp,ppos,&desc,file_read_actor);
retval += desc.written;
- if (!retval) {
+ if (!retval)
retval = desc.error;
+ if (desc.written != iov[seg].iov_len)
break;
- }
}
}
out: