Re: [RFC] page lock ordering and OCFS2

From: Andrew Morton
Date: Mon Oct 17 2005 - 18:19:00 EST


Zach Brown <zach.brown@xxxxxxxxxx> wrote:
>
>
> I sent an ealier version of this patch to linux-fsdevel and was met with
> deafening silence.

Maybe because nobody understood your description ;)

> I'm resending the commentary from that first mail and am
> including a new version of the patch. This time it has much clearer naming
> and some kerneldoc blurbs. Here goes...
>
> --
>
> In recent weeks we've been reworking the locking in OCFS2 to simplify things
> and make it behave more like a "local" Linux file system. We've run into an
> ordering inversion between a page's PG_locked and OCFS2's DLM locks that
> protect page cache contents. I'm including a patch at the end of this mail
> that I think is a clean way to give the file system a chance to get the
> ordering right, but we're open to any and all suggestions. We want to do the
> cleanest thing.

The patch is of course pretty unwelcome: lots of weird stuff in the core
VFS which everyone has to maintain but probably will not test.

So I think we need a better understanding of what the locking inversion
problem is, so we can perhaps find a better solution. Bear in mind that
ext3 has (rare, unsolved) lock inversion problems in this area as well, so
commonality will be sought for.

> OCFS2 maintains page cache coherence between nodes by requiring that a node
> hold a valid lock while there are active pages in the page cache.

"active pages in the page cache" means present pagecache pages in the node
which holds pages in its pagecache, yes?

> The page
> cache is invalidated before a node releases a lock so that another node can
> acquire it. While this invalidation is happening new locks can not be acquired
> on that node. This is equivalent to a DLM processing thread acquiring
> PG_locked during truncation while holding a DLM lock. Normal file system user
> tasks come to the a_ops with PG_locked acquired by their callers before they
> have a chance to get DLM locks.

So where is the lock inversion?

Perhaps if you were to cook up one of those little threadA/threadB ascii
diagrams we could see where the inversion occurs?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/