Re: [rfc][patch] remove racy sync_page?

From: Nick Piggin
Date: Mon May 29 2006 - 20:07:22 EST


Andrew Morton wrote:
On Mon, 29 May 2006 19:34:09 +1000
Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:


I'm not completely sure whether this is the bug or not,


"the bug". Are we suposed to know what yo're referring to here?

You were supposed to know, if I hadn't made a typo :) "a bug".



nor what would be
the performance consequences of my attached fix (wrt the block layer). So
you're probably cc'ed because I've found similar threads with your names
on them.



The performance risk is that someone will do lock_page() against a page
whose IO is queued-but-not-yet-kicked-off. We'll go to sleep with no IO
submitted until kblockd or someone else kicks off the IO for us.

Yes.


Try disabling kblockd completely, see what effect that has on performance.

Which is what I want to know. I don't exactly have an interesting
disk setup.

Can we get rid of the whole thing, confusing memory barriers and all? Nobody
uses anything but the default sync_page, and if block rq plugging is terribly
bad for performance, perhaps it should be reworked anyway? It shouldn't be a
correctness thing, right?


What this means is that it is not legal to run lock_page() against a
pagecache page if you don't have a ref on the inode.

Yes. So set_page_dirty_lock is broken, right?
And the wait_on_page_stuff needs an inode ref.
Also splice seems to have broken sync_page.


iirc the main (only?) offender here is direct-io reads into MAP_SHARED
pagecache. (And similar things, like infiniband and nfs-direct).

Well yes, writing to a page would be the main reason to set it dirty.
Is splice broken as well? I'm not sure that it always has a ref on the
inode when stealing a page.

It sounds like you think fixing the set_page_dirty_lock callers wouldn't
be too difficult? I wouldn't know (although the ptrace one should be
able to be turned into a set_page_dirty, because we're holding mmap_sem).

You're sure about all other lock_page()rs? I'm not, given that
set_page_dirty_lock got it so wrong. But you'd have a better idea than
me.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/