Re: [PATCH v3] NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals

From: Robert Milkowski
Date: Mon Jan 27 2020 - 09:45:22 EST


On Thu, 23 Jan 2020 at 19:08, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2020-01-22 at 19:10 +0000, Schumaker, Anna wrote:
> > Hi Robert,
> >
> > On Mon, 2020-01-20 at 17:55 +0000, Robert Milkowski wrote:
> > > > -----Original Message-----
> > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > > Sent: 30 December 2019 15:37
> > > > To: Robert Milkowski <rmilkowski@xxxxxxxxx>
> > > > Cc: Linux NFS Mailing List <linux-nfs@xxxxxxxxxxxxxxx>; Trond
> > > > Myklebust
> > > > <trond.myklebust@xxxxxxxxxxxxxxx>; Anna Schumaker
> > > > <anna.schumaker@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH v3] NFSv4.0: nfs4_do_fsinfo() should not do
> > > > implicit
> > > > lease renewals
> > > >
> > > >
> > > >
> > > > > On Dec 30, 2019, at 10:20 AM, Robert Milkowski <
> > > > > rmilkowski@xxxxxxxxx>
> > > > wrote:
> > > > > From: Robert Milkowski <rmilkowski@xxxxxxxxx>
> > > > >
> > > > > Currently, each time nfs4_do_fsinfo() is called it will do an
> > > > > implicit
> > > > > NFS4 lease renewal, which is not compliant with the NFS4
> > > > specification.
> > > > > This can result in a lease being expired by an NFS server.
> > > > >
> > > > > Commit 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing
> > > > > leases")
> > > > > introduced implicit client lease renewal in nfs4_do_fsinfo(),
> > > > > which
> > > > > can result in the NFSv4.0 lease to expire on a server side, and
> > > > > servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.
> > > > >
> > > > > This can easily be reproduced by frequently unmounting a sub-
> > > > > mount,
> > > > > then stat'ing it to get it mounted again, which will delay or
> > > > > even
> > > > > completely prevent client from sending RENEW operations if no
> > > > > other
> > > > > NFS operations are issued. Eventually nfs server will expire
> > > > > client's
> > > > > lease and return an error on file access or next RENEW.
> > > > >
> > > > > This can also happen when a sub-mount is automatically
> > > > > unmounted due
> > > > > to inactivity (after nfs_mountpoint_expiry_timeout), then it is
> > > > > mounted again via stat(). This can result in a short window
> > > > > during
> > > > > which client's lease will expire on a server but not on a
> > > > > client.
> > > > > This specific case was observed on production systems.
> > > > >
> > > > > This patch makes an explicit lease renewal instead of an
> > > > > implicit one,
> > > > > by adding RENEW to a compound operation issued by
> > > > > nfs4_do_fsinfo(),
> > > > > similarly to NFSv4.1 which adds SEQUENCE operation.
> > > > >
> > > > > Fixes: 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing
> > > > > leases")
> > > > > Signed-off-by: Robert Milkowski <rmilkowski@xxxxxxxxx>
> > > >
> > > > Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > >
> > > >
> > >
> > > How do we progress it further?
> >
> > Thanks for following up! I have the patch included in my linux-next
> > branch for
> > the next merge window.
>
> NACK. This is the wrong way to solve the problem. Lease renewal in
> NFSv4 does not need to be tied to fsinfo. It creates an unnecessary
> extra error condition that has absolutely nothing to do with the
> functionality of retrieving per-filesystem attributes.

Well, we also do it for NFSv4.1+ with the SEQUENCE operation being
send by fsinfo, and as Chuck pointed out
it makes sense to do similarly for 4.0, which is what this patch does.

Or as per the v2 version of the patch, not do the implicit RENEW for
4.0, which was a simpler patch,
but then not in-line with 4.1+.

>
> All that needs to be done here is to move the setting of clp-
> >cl_last_renewal _out_ of nfs4_set_lease_period(), and just have
> nfs4_proc_setclientid_confirm() and nfs4_update_session() call
> do_renew_lease().
>

This would also require nfs4_setup_state_renewal() to call
do_renew_lease() I think - at least it currently calls
nfs4_set_lease_period().
Also, iirc fsinfo() not setting cl_last_renewal leads to
cl_last_renewal initialization issues under some circumstances.

Then the RFC 7530 in section 16.34.5 states:
"SETCLIENTID_CONFIRM does not establish or renew a lease.", so calling
do_renew_lease() from nfs4_setclientid_confirm() doesn't seem to be
ok.

I'm not sure if is is valid to do implicit lease renewal in
nfs4_update_session() either...


Anyway, the patch would be something like (haven't tested it yet):

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 76d3716..a7af864 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5019,16 +5019,13 @@ static int nfs4_do_fsinfo(struct nfs_server
*server, struct nfs_fh *fhandle, str
struct nfs4_exception exception = {
.interruptible = true,
};
- unsigned long now = jiffies;
int err;

do {
err = _nfs4_do_fsinfo(server, fhandle, fsinfo);
trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err);
if (err == 0) {
- nfs4_set_lease_period(server->nfs_client,
- fsinfo->lease_time * HZ,
- now);
+ nfs4_set_lease_period(server->nfs_client,
fsinfo->lease_time * HZ)
break;
}
err = nfs4_handle_exception(server, err, &exception);
@@ -6146,6 +6143,10 @@ int nfs4_proc_setclientid_confirm(struct nfs_client *clp,
clp->cl_clientid);
status = rpc_call_sync(clp->cl_rpcclient, &msg,
RPC_TASK_TIMEOUT | RPC_TASK_NO_ROUND_ROBIN);
+ if(status == 0) {
+ unsigned long now = jiffies;
+ do_renew_lease(clp, now);
+ }
trace_nfs4_setclientid_confirm(clp, status);
dprintk("NFS reply setclientid_confirm: %d\n", status);
return status;
@@ -8590,6 +8591,8 @@ static void nfs4_update_session(struct
nfs4_session *session,
if (res->flags & SESSION4_BACK_CHAN)
memcpy(&session->bc_attrs, &res->bc_attrs,
sizeof(session->bc_attrs));
+ unsigned long now = jiffies;
+ do_renew_lease(session->clp, now);
}

static int _nfs4_proc_create_session(struct nfs_client *clp,
diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 6ea431b..ff876dd 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -138,15 +138,12 @@
*
* @clp: pointer to nfs_client
* @lease: new value for lease period
- * @lastrenewed: time at which lease was last renewed
*/
void nfs4_set_lease_period(struct nfs_client *clp,
- unsigned long lease,
- unsigned long lastrenewed)
+ unsigned long lease)
{
spin_lock(&clp->cl_lock);
clp->cl_lease_time = lease;
- clp->cl_last_renewal = lastrenewed;
spin_unlock(&clp->cl_lock);

/* Cap maximum reconnect timeout at 1/2 lease period */
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3455232..d7b02fd 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -102,7 +102,8 @@ static int nfs4_setup_state_renewal(struct nfs_client *clp)
now = jiffies;
status = nfs4_proc_get_lease_time(clp, &fsinfo);
if (status == 0) {
- nfs4_set_lease_period(clp, fsinfo.lease_time * HZ, now);
+ nfs4_set_lease_period(clp, fsinfo.lease_time * HZ);
+ nfs4_do_renew_lease(clp, now);
nfs4_schedule_state_renewal(clp);
}



But it potentially has issues, as just pointed out, mainly with not
being compliant with rfc again.

Also see the comments above about the SEQUENCE.
Trond - ?

Chuck, Anna - which way do you prefer, the one proposed by Chuck or
the one now proposed by Trond?

Or perhaps my v2 patch which is least invasive and just stops doing
implicit renewals for 4.0 if cl_last_renewal is already set.

I personally think the way proposed by Chuck is fully compliant with
RFC and in-line with what we currently do for 4.1 via SEQUENCE,
so perhaps the best option. ?

--
Robert Milkowski