Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

From: Trond Myklebust
Date: Fri Jun 19 2015 - 16:30:54 EST


On Fri, 2015-06-19 at 15:52 -0400, Jeff Layton wrote:
> On Fri, 19 Jun 2015 13:39:08 -0400
> Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>
> > On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt <
> > rostedt@xxxxxxxxxxx> wrote:
> > > On Fri, 19 Jun 2015 12:25:53 -0400
> > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > >
> > > > I don't see that 55201 anywhere. But then again, I didn't look
> > > > for it
> > > > before the port disappeared. I could reboot and look for it
> > > > again. I
> > > > should have saved the full netstat -tapn as well :-/
> > >
> > > Of course I didn't find it anywhere, that's the port on my wife's
> > > box
> > > that port 947 was connected to.
> > >
> > > Now I even went over to my wife's box and ran
> > >
> > > # rpcinfo -p localhost
> > > program vers proto port service
> > > 100000 4 tcp 111 portmapper
> > > 100000 3 tcp 111 portmapper
> > > 100000 2 tcp 111 portmapper
> > > 100000 4 udp 111 portmapper
> > > 100000 3 udp 111 portmapper
> > > 100000 2 udp 111 portmapper
> > > 100024 1 udp 34243 status
> > > 100024 1 tcp 34498 status
> > >
> > > which doesn't show anything.
> > >
> > > but something is listening to that port...
> > >
> > > # netstat -ntap |grep 55201
> > > tcp 0 0 0.0.0.0:55201 0.0.0.0:*
> > > LISTEN
> >
> >
> > Hang on. This is on the client box while there is an active NFSv4
> > mount? Then that's probably the NFSv4 callback channel listening
> > for
> > delegation callbacks.
> >
> > Can you please try:
> >
> > echo "options nfs callback_tcpport=4048" > /etc/modprobe.d/nfs
> > -local.conf
> >
> > and then either reboot the client or unload and then reload the nfs
> > modules before reattempting the mount. If this is indeed the
> > callback
> > channel, then that will move your phantom listener to port 4048...
> >
>
> Right, it was a little unclear to me before, but it now seems clear
> that the callback socket that the server is opening to the client is
> the one squatting on the port.
>
> ...and that sort of makes sense, doesn't it? That rpc_clnt will stick
> around for the life of the client's lease, and the rpc_clnt binds to
> a
> particular port so that it can reconnect using the same one.
>
> Given that Stephen has done the legwork and figured out that
> reverting
> those commits fixes the issue, then I suspect that the real culprit
> is
> caf4ccd4e88cf2.
>
> The client is likely closing down the other end of the callback
> socket when it goes idle. Before that commit, we probably did an
> xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves
> the port bound.
>

Agreed. I've been looking into whether or not there is a simple fix.
Reverting those patches is not an option, because the whole point was
to ensure that the socket is in the TCP_CLOSED state before we release
the socket.

Steven, how about something like the following patch?

8<-----------------------------------------------------------------