Re: [rfc][patch] remove racy sync_page?

From: Andrew Morton
Date: Mon May 29 2006 - 15:19:28 EST


On Mon, 29 May 2006 19:34:09 +1000
Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> I'm not completely sure whether this is the bug or not,

"the bug". Are we suposed to know what yo're referring to here?

> nor what would be
> the performance consequences of my attached fix (wrt the block layer). So
> you're probably cc'ed because I've found similar threads with your names
> on them.
>

The performance risk is that someone will do lock_page() against a page
whose IO is queued-but-not-yet-kicked-off. We'll go to sleep with no IO
submitted until kblockd or someone else kicks off the IO for us.

Try disabling kblockd completely, see what effect that has on performance.

>
> lock_page (and other waiters on page flags bits) use sync_page when sleeping
> on a bit. sync_page, however, requires that the page's mapping be pinned
> (which is what we're sometimes trying to lock the page for).
>
> Blatant offender is set_page_dirty_lock, which falls to the race it purports
> to fix. Perhaps all callers could be fixed, however it seems that the pinned
> mapping lock_page precondition is counter-intuitive and I'd bet other callers
> to lock_page or wait_on_page_bit have got it wrong too.
>
> Also: splice can change a page's mapping, so it would have been possible to
> use the wrong mapping to "sync" a page.
>
> Can we get rid of the whole thing, confusing memory barriers and all? Nobody
> uses anything but the default sync_page, and if block rq plugging is terribly
> bad for performance, perhaps it should be reworked anyway? It shouldn't be a
> correctness thing, right?

What this means is that it is not legal to run lock_page() against a
pagecache page if you don't have a ref on the inode.

iirc the main (only?) offender here is direct-io reads into MAP_SHARED
pagecache. (And similar things, like infiniband and nfs-direct).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/