Re: [PATCH 4.19 000/338] 4.19.238-rc1 review

From: Michael Trimarchi
Date: Fri Dec 16 2022 - 13:32:13 EST


Hi Neil

On Tue, Apr 26, 2022 at 12:29:55PM +1000, NeilBrown wrote:
> On Thu, 21 Apr 2022, Naresh Kamboju wrote:
> > On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
> > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > This is the start of the stable review cycle for the 4.19.238 release.
> > > > There are 338 patches in this series, all will be posted as a response
> > > > to this one. If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
> > > > Anything received after that time might be too late.
> > > >
> > > > The whole patch series can be found in one patch at:
> > > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
> > > > or in the git tree and branch at:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> > > > and the diffstat can be found below.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > >
> > > Following kernel warning noticed on arm64 Juno-r2 while booting
> > > stable-rc 4.19.238. Here is the full test log link [1].
> > >
> > > [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> > > [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
> > > 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
> > > [ 0.000000] Machine model: ARM Juno development board (r2)
> > > <trim>
> > > [ 18.499895] ================================
> > > [ 18.504172] WARNING: inconsistent lock state
> > > [ 18.508451] 4.19.238 #1 Not tainted
> > > [ 18.511944] --------------------------------
> > > [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > > [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [ 18.527826] (____ptrval____)
> > > (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
> > > [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
> > > [ 18.541543] lock_acquire+0xc8/0x23c
>
> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> spin_unlock_bh().
>

We get the same deadlock or similar one and we think that
can be connected to this thread on 4.19.243. For us is a bit
difficult to hit but we are going to apply this change

net: sunrpc: Fix deadlock in xprt_destroy

Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
spin_unlock_bh().

Signed-off-by: Michael Trimarchi <michael@xxxxxxxxxxxxxxxxxxxx>
---
net/sunrpc/xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d05fa7c36d00..b1abf4848bbc 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1550,9 +1550,9 @@ static void xprt_destroy(struct rpc_xprt *xprt)
* is cleared. We use ->transport_lock to ensure the mod_timer()
* can only run *before* del_time_sync(), never after.
*/
- spin_lock(&xprt->transport_lock);
+ spin_lock_bh(&xprt->transport_lock);
del_timer_sync(&xprt->timer);
- spin_unlock(&xprt->transport_lock);
+ spin_unlock_bh(&xprt->transport_lock);

/*
* Destroy sockets etc from the system workqueue so they can
--
2.37.2

> Thanks,
> NeilBrown
>

Thank you