Re: Oops in rpc_clnt_debugfs_register() from debugfs change

From: Greg Kroah-Hartman
Date: Tue Feb 12 2019 - 09:42:20 EST


On Tue, Feb 12, 2019 at 03:37:20PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Feb 12, 2019 at 02:31:14PM +0000, David Howells wrote:
> > I've bisected an oops that occurs in rpc_clnt_debugfs_register() trying to
> > dereference a pointer with -EACCES in it. This is the causing commit, though
> > I suspect the bug is in sunrpc expecting to see NULL rather than an error.
> >
> > ff9fb72bc07705c00795ca48631f7fffe24d2c6b is the first bad commit
> > commit ff9fb72bc07705c00795ca48631f7fffe24d2c6b
> > Author: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > Date: Wed Jan 23 11:28:14 2019 +0100
> >
> > debugfs: return error values, not NULL
> >
> > When an error happens, debugfs should return an error pointer value, not
> > NULL. This will prevent the totally theoretical error where a debugfs
> > call fails due to lack of memory, returning NULL, and that dentry value
> > is then passed to another debugfs call, which would end up succeeding,
> > creating a file at the root of the debugfs tree, but would then be
> > impossible to remove (because you can not remove the directory NULL).
> >
> > So, to make everyone happy, always return errors, this makes the users
> > of debugfs much simpler (they do not have to ever check the return
> > value), and everyone can rest easy.
> > ...
> >
> > The attached oops occurs during boot from the gssproxy process in
> > rpc_clnt_debugfs_register(). The code at this point is:
> >
> > 0xffffffff8195cbdd <+450>: mov 0x50(%rax),%rcx <--- oopsing
> > 0xffffffff8195cbe1 <+454>: mov $0xffffffff821cc8ba,%rdx
> > 0xffffffff8195cbe8 <+461>: mov $0x18,%esi
> > 0xffffffff8195cbed <+466>: lea -0x30(%rbp),%rdi
> > 0xffffffff8195cbf1 <+470>: callq 0xffffffff819db773 <snprintf>
> >
> > RAX is -EACCES.
> >
> > Looking in the source:
> >
> > len = snprintf(name, sizeof(name), "../../rpc_xprt/%s",
> > xprt->debugfs->d_name.name);
> >
> > I think xprt->debugfs is the value in RAX.
> >
> > (gdb) p &((struct dentry *)0)->d_name.name
> > $5 = (const unsigned char **) 0x50 <irq_stack_union+80>
> >
> > which matches the offset on the oopsing MOV instruction.
> >
> > This is with linus/master (aa0c38cf39de73bf7360a3da8f1707601261e518).
>
> Ugh, yeah, I see the problem, sorry about that.
>
> I wonder why the debugfs call is always failing, that's not good...
>
> let me dig and see if I already have a patch for this...

I have a much larger cleanup patch for this code, but this single line
change should solve the issue for now. Can you test it to verify?

thanks,

greg k-h

------------------

diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c
index 45a033329cd4..19bb356230ed 100644
--- a/net/sunrpc/debugfs.c
+++ b/net/sunrpc/debugfs.c
@@ -146,7 +146,7 @@ rpc_clnt_debugfs_register(struct rpc_clnt *clnt)
rcu_read_lock();
xprt = rcu_dereference(clnt->cl_xprt);
/* no "debugfs" dentry? Don't bother with the symlink. */
- if (!xprt->debugfs) {
+ if (IS_ERR_OR_NULL(xprt->debugfs)) {
rcu_read_unlock();
return;
}