Re: nfs4_schedule_state_manager stuck in tight loop

From: Benjamin Coddington
Date: Tue Mar 05 2024 - 06:39:55 EST


On 5 Mar 2024, at 1:09, Christian Theune wrote:

> Hi,
>
> not sure whether I may have missed a response that didn’t make it back to me or any of the lists.
>
> Just in case, because the CC didn’t include the original addendum I made to my report:
>
> Addendum:
>
> I’ve checked kernel changelogs since then but didn’t find anything that I could relate to this aside from *maybe* dfda2a5eb66a685aa6d0b81c0cef1cf8bfe0b3c4 (rename(): fix the locking of subdirectories) which mentions NFS but doesn’t describe the potential impact.
>
> We’re running 5.15.148 now and as it’s been another 2 months there might be the chance of another lockup in the near future ;)
>
> If anyone has ideas on how to debug/approach a reproducer I’d be more than happy to help and try to provide more data.
>
> Cheers,
> Christian

When the problem occurs, use the

nfs4:nfs4_state_mgr
nfs4:nfs4_state_mgr_failed

tracepoints to see what the state manager might be doing.

Also a network capture might show what the state manager thread is up to if
it is sending operations.

Ben