Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

From: Bob Peterson
Date: Fri Aug 20 2021 - 09:11:43 EST


On 8/20/21 4:35 AM, Steven Whitehouse wrote:
Hi,

On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
From: Bob Peterson <rpeterso@xxxxxxxxxx>

This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure
that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the
locking
state of a holder and that holder has the HIF_MAY_DEMOTE flag set,
the
holder will be demoted automatically before the incoming locking
request
is granted.

I'm not sure I understand what is going on here. When there are locking
conflicts we generate call backs and those result in glock demotion.
There is no need for a flag to indicate that I think, since it is the
default behaviour anyway. Or perhaps the explanation is just a bit
confusing...

I agree that the whole concept and explanation are confusing. Andreas and I went through several heated arguments about the symantics, comments, patch descriptions, etc. We played around with many different flag name ideas, etc. We did not agree on the best way to describe the whole concept. He didn't like my explanation and I didn't like his. So yes, it is confusing.

My preferred terminology was "DOD" or "Dequeue On Demand" which makes the concept more understandable to me. So basically a process can say
"I need to hold this glock, but for an unknown and possibly lengthy period of time, but please feel free to dequeue it if it's in your way."
And bear in mind that several processes may do the same, simultaneously.

You can almost think of this as a performance enhancement. This concept allows a process to hold a glock for much longer periods of time, at a lower priority, for example, when gfs2_file_read_iter needs to hold the glock for very long-running iterative reads.

The process requesting a holder with "Demote On Demand" must then determine if its holder has been stolen away (dequeued on demand) after its lengthy operation, and therefore needs to pick up the pieces of where it left off in its process.

Meanwhile, another process may need to hold the glock. If its requested mode is compatible, say SH and SH, the lock is simply granted with no further delay. If the mode is incompatible, regardless of whether it's on the local node or a different node in the cluster, these longer-term/lower-priority holders may be dequeued or prempted by another request to hold the glock. Note that although these holders are dequeued-on-demand, they are never "uninitted" as part of the process. Nor must they ever be, since they may be on another process's heap.

This differs from the normal glock demote process in which the demote bit is set on ("requesting" the glock be demoted) but still needs to block until the holder does its actual dequeue.

Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(). When they need the glock again,
they call gfs2_holder_disallow_demote() and then they check if the
holder is still queued: if it is, they're still holding the glock; if
it
isn't, they need to re-acquire the glock.

This allows processes to hang on to locks that could become part of a
cyclic locking dependency. The locks will be given up when a (rare)
conflicting locking request occurs, and don't need to be given up
prematurely.
This seems backwards to me. We already have the glock layer cache the
locks until they are required by another node. We also have the min
hold time to make sure that we don't bounce locks too much. So what is
the problem that you are trying to solve here I wonder?

Again, this is simply allowing premption of lenghy/low-priority holders whereas the normal demote process will only demote when the glock is dequeued after this potentially very-long period of time.

The minimum hold time solves a different problem, and Andreas and I talked just yesterday about possibly revisiting how that all works. The problem with minimum hold time is that in many cases the glock state machine does not want to grant new holders if the demote bit is on, so it ends up wasting more time than solving the actual problem.
But that's another problem for another day.

Regards,

Bob Peterson