Re: [PATCH] nfsd: properly tear down server when write_ports fails

From: NeilBrown
Date: Tue Dec 12 2023 - 22:46:02 EST


On Wed, 13 Dec 2023, Chuck Lever wrote:
> On Tue, Dec 12, 2023 at 08:50:46AM -0500, Chuck Lever wrote:
> > On Mon, Dec 11, 2023 at 06:11:04PM -0500, Jeff Layton wrote:
> > > On Tue, 2023-12-12 at 09:59 +1100, NeilBrown wrote:
> > > > On Tue, 12 Dec 2023, Jeff Layton wrote:
> > > > > When the initial write to the "portlist" file fails, we'll currently put
> > > > > the reference to the nn->nfsd_serv, but leave the pointer intact. This
> > > > > leads to a UAF if someone tries to write to "portlist" again.
> > > > >
> > > > > Simple reproducer, from a host with nfsd shut down:
> > > > >
> > > > > # echo "foo 2049" > /proc/fs/nfsd/portlist
> > > > > # echo "foo 2049" > /proc/fs/nfsd/portlist
> > > > >
> > > > > The kernel will oops on the second one when it trips over the dangling
> > > > > nn->nfsd_serv pointer. There is a similar bug in __write_ports_addfd.
> > > > >
> > > > > This patch fixes it by adding some extra logic to nfsd_put to ensure
> > > > > that nfsd_last_thread is called prior to putting the reference when the
> > > > > conditions are right.
> > > > >
> > > > > Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()")
> > > > > Closes: https://issues.redhat.com/browse/RHEL-19081
> > > > > Reported-by: Zhi Li <yieli@xxxxxxxxxx>
> > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > ---
> > > > > This should probably go to stable, but we'll need to backport for v6.6
> > > > > since older kernels don't have nfsd_nl_rpc_status_get_done. We should
> > > > > just be able to drop that hunk though.
> > > > > ---
> > > > > fs/nfsd/nfsctl.c | 32 ++++++++++++++++++++++++++++----
> > > > > fs/nfsd/nfsd.h | 8 +-------
> > > > > fs/nfsd/nfssvc.c | 2 +-
> > > > > 3 files changed, 30 insertions(+), 12 deletions(-)
> > > >
> > > > This is much the same as
> > > >
> > > > https://lore.kernel.org/linux-nfs/20231030011247.9794-2-neilb@xxxxxxx/
> > > >
> > > > It seems that didn't land. Maybe I dropped the ball...
> > >
> > > Indeed it is. I thought the problem seemed familiar. Your set is
> > > considerably more comprehensive. Looks like I even sent some Reviewed-
> > > bys when you sent it.
> > >
> > > Chuck, can we get these in or was there a problem with them?
> >
> > Offhand, I'd say either I was waiting for some review comments
> > to be addressed or the mail got lost (vger or Exchange or I
> > accidentally deleted the series). I'll go take a look.
>
> I reviewed the thread:
>
> https://lore.kernel.org/linux-nfs/20231030011247.9794-1-neilb@xxxxxxx/
>
> From the looks of it, I was expecting Neil to address a couple of
> review comments and repost. These are the two comments that stand
> out to me now:
>
> On 1/5:
>
> > > Then let's add
> > >
> > > Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount")
> > >
> > > to this one, since it addresses a crasher seen in the wild.
> >
> > Sounds good.
> >
> > > > but it won't fix the hinky error cleanup in nfsd_svc. It looks like that
> > > > does get fixed in patch #4 though, so I'm not too concerned.
> > >
> > > Does that fix also need to be backported?
> >
> > I think so, but we might want to split that out into a more targeted
> > patch and apply it ahead of the rest of the series. Our QA folks seem to
> > be able to hit the problem somehow, so it's likely to be triggered by
> > people in the field too.
>
> This last paragraph requests a bit of reorganization to enable an
> easier backport.

I think the "error cleanup" was addressed in a different series. Maybe
it hasn't landed either.

>
> And on 2/5:
>
> > > > > +struct pool_private {
> > > > > + struct svc_serv *(*get_serv)(struct seq_file *, bool);
> > > >
> > > > This bool is pretty ugly. I think I'd rather see two operations here
> > > > (get_serv/put_serv). Also, this could use a kerneldoc comment.
> > >
> > > I agree that bool is ugly, but two function pointers as function args
> > > seemed ugly, and stashing them in 'struct svc_serv' seemed ugly.
> > > So I picked one. I'd be keen to find an approach that didn't require a
> > > function pointer.
> > >
> > > Maybe sunrpc could declare
> > >
> > > struct svc_ref {
> > > struct mutex mutex;
> > > struct svc_serv *serv;
> > > }
> > >
> > > and nfsd could use one of those instead of nfsd_mutex and nfsd_serv, and
> > > pass a pointer to it to the open function.
> > >
> > > But then the mutex would have to be in the per-net structure. And maybe
> > > that isn't a bad idea, but it is a change...
> > >
> > > I guess I could pass pointers to nfsd_mutex and nn->nfsd_serv to the
> > > open function....
> > >
> > > Any other ideas?
> >
> > I think just passing two function pointers to svc_pool_stats_open, and
> > storing them both in the serv is the best solution (for now). Like you
> > said, there are no clean options here. That function only has one caller
> > though, so at least the nastiness will be confined to that.
> >

We can't store the function pointers in the serv because the purpose of
the first function is to find the serv.

I guess I should just repost everything again.... but it isn't a good
time for year for sustained debates.

NeilBrown


> > Moving the mutex to be per-net does make a lot of sense, but I think
> > that's a separate project. If you decide to do that and it allows you to
> > make a simpler interface for handling the get/put_serv pointers, then
> > the interface can be reworked at that point.
>
> The other requests I see in that thread have already been answered
> adequately.
>
>
> --
> Chuck Lever
>