Re: [Cluster-devel] [RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1)

From: Steven Whitehouse
Date: Sun Jun 13 2021 - 05:10:10 EST


Hi,

On Sat, 2021-06-12 at 21:35 +0000, Al Viro wrote:
> On Sat, Jun 12, 2021 at 09:05:40PM +0000, Al Viro wrote:
>
> > Is the above an accurate description of the mainline situation
> > there?
> > In particular, normal read doesn't seem to bother with locks at
> > all.
> > What exactly are those cluster locks for in O_DIRECT read?
>
> BTW, assuming the lack of contention, how costly is
> dropping/regaining
> such cluster lock?
>

The answer is that it depends...

The locking modes for glocks for inodes look like this:

========== ========== ============== ========== ==============
Glock mode Cache data Cache Metadata Dirty Data Dirty Metadata
========== ========== ============== ========== ==============
UN No No No No
SH Yes Yes No No
DF No Yes No No
EX Yes Yes Yes Yes
========== ========== ============== ========== ==============

The above is a copy & paste from Documentation/filesystems/gfs2-
glocks.rst. If you think of these locks as cache control, then it makes
a lot more sense.

The DF (deferred) mode is there only for DIO. It is a shared lock mode
that is incompatible with the normal SH mode. That is because it is ok
to cache data pages under SH but not under DF. That the only other
difference between the two shared modes. DF is used for both read and
write under DIO meaning that it is possible for multiple nodes to read
& write the same file at the same time with DIO, leaving any
synchronisation to the application layer. As soon as one performs an
operation which alters the metadata tree (truncate, extend, hole
filling) then we drop back to the normal EX mode, so DF is only used
for preallocated files.

Your original question though was about the cost of locking, and there
is a wide variation according to circumstances. The glock layer caches
the results of the DLM requests and will continue to hold glocks gained
from remote nodes until either memory pressure or requests to drop the
lock from another node is received.

When no other nodes are interested in a lock, all such cluster lock
activity is local. There is a cost to it though, and if (for example)
you tried to take and drop the cluster lock on every page, that would
definitely be noticeable. There are probably optimisations that could
be done on what is quite a complex code path, but in general thats what
we've discovered from testing. The introduction of ->readpages() vs the
old ->readpage() made a measurable difference and likewise on the write
side, iomap has also show performance increases due to the reduction in
locking on multi-page writes.

If there is another node that has an interest in a lock, then it can
get very expensive in terms of latency to regain a lock. To drop the
lock to a lower mode may involve I/O (from EX mode) and journal
flush(es) and to get the lock back again involves I/O to other nodes
and then a wait while they finish what they are doing. To avoid
starvation there is a "minimum hold time" so that when a node gains a
glock, it is allowed to retain it, in the absence of local requests,
for a short period. The idea being that if a large number of glock
requests are being made on a node, each for a short time, we allow
several of those to complete before we do the expensive glock release
to another node.

See Documentation/filesystems/gfs2-glocks.rst for a longer explanation
and locking order/rules between different lock types,

Steve.