Re: 2.4.24 SMP lockups

From: David Woodhouse
Date: Wed Jan 14 2004 - 13:31:16 EST


On Wed, 2004-01-14 at 09:07 -0800, Simon Kirby wrote:
> I also have an entire sysrq-T, but it is for over 500 processes, so I
> posted the entire serial capture log as well, as a few other things
> here:
>
> http://blue.netnation.com/sim/ref/2.4.24_stuck_cpu/

Perfect report; thanks.

It deadlocked in attempting to get a spinlock, in remove_wait_queue().

(Look at the address it wanted to jump to when it got the lock, from
0xc011c7cf to 0xc011c7cf+0xffffe996 == 0xc011b165).

This is almost probably because the remove_wait_queue() in
__wait_on_freeing_inode() is removing us from a waitqueue in an inode
which has already been freed. The memory which used to hold a spinlock
has been reused, and it now looks locked, so we wait. For ever.

This differs from the working 2.6 version, where the waitqueue is in a
hsah table and doesn't go away.

I _think_ it's true that the _only_ way we can get woken from
__wait_on_freeing_inode() is the inode has actually been destroyed, in
which case it's fine just to _not_ remove ourselves from the (defunct)
wait queue, and to return. But I need to stare hard at it some more,
have another cup of tea, and ask Al :)

If I'm right in the above, then this should work....

===== fs/inode.c 1.47 vs edited =====
*** /tmp/inode.c-1.47-18008 Thu Jan 8 12:23:51 2004
--- fs/inode.c Wed Jan 14 18:25:33 2004
*************** static void __wait_on_freeing_inode(stru
*** 264,270 ****
--- 264,274 ----
set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock(&inode_lock);
schedule();
+ /* Inode is dead or dying. The wait queue is obsolete and we don't need to
+ remove ourselves from it. More to the point we _mustn't_ remove ourselves
+ since it may already have been freed
remove_wait_queue(&inode->i_wait, &wait);
+ */
spin_lock(&inode_lock);
}



--
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/