Re: [RFC] page lock ordering and OCFS2

From: Badari Pulavarty
Date: Mon Oct 17 2005 - 18:38:22 EST


On Mon, 2005-10-17 at 15:20 -0700, Zach Brown wrote:
> I sent an ealier version of this patch to linux-fsdevel and was met with
> deafening silence. I'm resending the commentary from that first mail and am
> including a new version of the patch. This time it has much clearer naming
> and some kerneldoc blurbs. Here goes...
>
> --
>
> In recent weeks we've been reworking the locking in OCFS2 to simplify things
> and make it behave more like a "local" Linux file system. We've run into an
> ordering inversion between a page's PG_locked and OCFS2's DLM locks that
> protect page cache contents. I'm including a patch at the end of this mail
> that I think is a clean way to give the file system a chance to get the
> ordering right, but we're open to any and all suggestions. We want to do the
> cleanest thing.
>
> OCFS2 maintains page cache coherence between nodes by requiring that a node
> hold a valid lock while there are active pages in the page cache. The page
> cache is invalidated before a node releases a lock so that another node can
> acquire it. While this invalidation is happening new locks can not be acquired
> on that node. This is equivalent to a DLM processing thread acquiring
> PG_locked during truncation while holding a DLM lock. Normal file system user
> tasks come to the a_ops with PG_locked acquired by their callers before they
> have a chance to get DLM locks.
>
> We talked a little about modifying the invalidation path to avoid waiting for
> pages that are held by an OCFS2 path that is waiting for a DLM lock. It seems
> awfully hard to get right without some very nasty code. The truncation on lock
> removal has to write back dirty pages so that other nodes can read it -- it
> can't simply skip these pages if they happened to be locked.
>
> So we aimed right for the lock ordering inversion problem with the appended
> patch. It gives file systems a callback that is tried before page locks are
> acquired that are going to be passed in to the file system's a_ops methods.
>
> So, what do people think about this? Is it reasonable to patch the core to
> help OCFS2 with this ordering inversion?

Sorry for the "deafening silence" on fs-devel. I was trying to see
what exactly I need and how to combine with what you are trying to
do, before I reply. Unfortunately, I don't understand your lock
inversion problem well enough :(

I was looking at ext3 pagelock, trasaction ordering. I wanted
a way to reverse the ordering to support writepages() cleanly.
I guess what you are proposing could be used (in a weird way) to
that, except that I need the support in different places (callers
of writepage, writepages). BTW, I wasn't comfortable touching VFS
locking just of ext3 purpose :(

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/