Re: Crash when unmounting NFS/TCP with -f

From: Trond Myklebust
Date: Thu May 05 2005 - 07:01:56 EST


to den 05.05.2005 Klokka 12:17 (+0200) skreiv Brice Goglin:

> Unable to handle kernel paging request at virtual address ffffff98
> printing eip:
> e0aaa07a
> *pde = 00002067
> *pte = 00000000
> Oops: 0002 [#1]
> PREEMPT
>
> Modules linked in: netconsole sd_mod usb_storage vfat fat loop isofs
> zlib_inflate nls_cp850 nls_iso8859_15 smbfs nfs lockd sunrpc i915 tun
> ipt_MASQUERADE iptable_nat ipt_state ip_conntrack iptable_filter
> ip_tables floppy uhci_hcd ehci_hcd dm_mod snd_intel8x0 snd_ac97_codec
>
> CPU: 0
> EIP: 0060:[<e0aaa07a>] Not tainted VLI
> EFLAGS: 00010297 (2.6.11=Macvin)
> EIP is at rpc_wake_up_status+0x6a/0x80 [sunrpc]
> eax: ffffff84 ebx: d0065888 ecx: 00000001 edx: c146e000
> esi: fffffffb edi: d0065888 ebp: d0065800 esp: c146ef14
> ds: 007b es: 007b ss: 0068
> Process events/0 (pid: 3, threadinfo=c146e000 task=c1473020)
> Stack: c146ef44 d0065800 00000283 fffffffb e0aa710e d0065888 fffffffb
> 00120dcb c1473184 00000000 d0065904

Have you tried the attached patch? Andrew has already included it in the
-mm series.

Cheers,
Trond
--- Begin Message --- Make the socket transport kick the event queue to start socket connects
immediately. This should improve responsiveness of applications that are
sensitive to slow mount operations (like automounters).

We are now also careful to cancel the connect worker before destroying
the xprt. This eliminates a race where xprt_destroy can finish before
the connect worker is even allowed to run.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. Hard-code impossibly small connect timeout.

Version: Fri, 29 Apr 2005 15:32:01 -0400

Signed-off-by: Chuck Lever <cel@xxxxxxxxxx>
---

net/sunrpc/xprt.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletion(-)


diff -X /home/cel/src/linux/dont-diff -Naurp 10-rpc-reconnect/net/sunrpc/xprt.c 11-xprt-flush-connects/net/sunrpc/xprt.c
--- 10-rpc-reconnect/net/sunrpc/xprt.c 2005-04-29 15:18:47.677108000 -0400
+++ 11-xprt-flush-connects/net/sunrpc/xprt.c 2005-04-29 15:29:36.637250000 -0400
@@ -569,8 +569,11 @@ void xprt_connect(struct rpc_task *task)
if (xprt->sock != NULL)
schedule_delayed_work(&xprt->sock_connect,
RPC_REESTABLISH_TIMEOUT);
- else
+ else {
schedule_work(&xprt->sock_connect);
+ if (!RPC_IS_ASYNC(task))
+ flush_scheduled_work();
+ }
}
return;
out_write:
@@ -1666,6 +1669,10 @@ xprt_shutdown(struct rpc_xprt *xprt)
rpc_wake_up(&xprt->backlog);
wake_up(&xprt->cong_wait);
del_timer_sync(&xprt->timer);
+
+ /* synchronously wait for connect worker to finish */
+ cancel_delayed_work(&xprt->sock_connect);
+ flush_scheduled_work();
}

/*

--- End Message ---