Re: [PATCH] SUNRPC: have soft RPC tasks return -ETIMEDOUT instead of -EIO on major connect timeout

From: Chuck Lever
Date: Mon Mar 31 2008 - 15:56:33 EST


On Mar 29, 2008, at 12:44 PM, Trond Myklebust wrote:
On Sat, 2008-03-29 at 08:49 -0400, Jeff Layton wrote:
NFSv4 background mounts do not currently work correctly. While we could
try to fix this in userspace, I think it's really a kernel problem...

When a soft RPC tasks experiences a major timeout during a connection
attempt, it does an rpc_exit with a return code of -EIO. For NFSv4
mounts, this makes the mount() syscall return -EIO. mount.nfs4 then
interprets that as a "permanent" error, and won't attempt a background
mount when bg is specified. Fix this by making call_timeout() do the
rpc_exit() with an error of -ETIMEDOUT.

This fixes the background mount issue, but does make other syscalls
on soft mounts return ETIMEDOUT instead of EIO in this situation.

Comments welcome.

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
net/sunrpc/clnt.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8c6a7f1..b6d409e 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1162,7 +1162,7 @@ call_timeout(struct rpc_task *task)
if (RPC_IS_SOFT(task)) {
printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
clnt->cl_protname, clnt->cl_server);
- rpc_exit(task, -EIO);
+ rpc_exit(task, -ETIMEDOUT);
return;
}

While that may be acceptable for the mount() syscall, I don't think
POSIX applications are quite ready to deal with ETIMEDOUT as an error
for stat() or chdir().

Having the RPC client throw -EIO on a timeout always seemed a little crude to me. EIO is quite overloaded -- the same error is returned if there's a XDR decoding error, for example. Clearly other consumers of RPC (mount, for example) would like a distinction between a timeout and an outright I/O error.

The fact that applications using NFS files can't deal with -ETIMEDOUT should probably be managed in the NFS client, not in the RPC client. Perhaps it could be handled with a wrapper function, like the NFS client handles EJUKEBOX.

So I agree that Jeff's patch is insufficient as it stands, but the underlying idea is probably a good one.

Userland has the clnt_geterr() function that returns more detailed 'RPC
level' errors. While that 'error function call' approach doesn't work in
a multi-threaded environment, we might still be able to add the
equivalent of a pointer to an 'rpc_err' structure to the rpc_task, and
then have functions like call_timeout() (and especially call_verify ()!)
fill in more detailed error info if that pointer is non-zero?


That's not a bad idea either.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/