Re: [PATCH]lockd: fix handling of grace period after long periodsof inactivity

From: Fernando Luis Vázquez Cao
Date: Thu Aug 14 2008 - 21:32:39 EST


Hi Bruce!

On Thu, 2008-08-14 at 15:06 -0400, J. Bruce Fields wrote:
> On Thu, Aug 14, 2008 at 08:08:16PM +0900, NAKANO Hiroaki wrote:
> > lockd uses time_before() to determine whether the grace period has
> > expired. This would seem to be enough to avoid timer wrap-around issues,
> > but, unfortunately, that is not the case. The time_* family of
> > comparison functions can be safely used to compare jiffies relatively
> > close in time, but they stop working after approximately LONG_MAX/2
> > ticks. nfsd can suffer this problem because the time_before() comparison
> > in lockd() is not performed until the first request comes in, which
> > means that if there is no lockd traffic for more than LONG_MAX/2 ticks
> > we are screwed.
> >
> > The implication of this is that once time_before() starts misbehaving
> > any attempt from a NFS client to execute fcntl() will be received with a
> > NLM_LCK_DENIED_GRACE_PERIOD message for 25 days (assuming HZ=1000). In
> > other words, the 50 seconds grace period could turn into a grace period
> > of 50 days or more.
> >
> > This patch corrects this behavior by implementing grace period with a
> > (retriggerable) timer.
> >
> > Note: This bug was analyzed independently by Oda-san <oda@xxxxxxxxxxxxx>
> > and myself.
>
> Good catch! Did you actually run across this in practice? I would've
> thought it relatively unusual to have a lockd that didn't receive its
> first lock request until 25 days after startup.
Yes, we did find this problem in production. More often than one would
wish, installing new software in a system that has been running without
a hiccup for weeks or months is the only thing you will need to bring
mayhem.

> I still have a mild preference for a work struct just in case we end up
> wanting to do something slightly more complicated to end the grace
> period, but I don't really have anything in mind.
For simplicity I think we could we get Nakano-san's patch merged first.
If needed, moving to a work-based solution should be relatively easily.

Thank you for you comments!

- Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/