Re: [2.6.38-3.x] [BUG] soft lockup - CPU#X stuck for 23s! (vfs, autofs, vserver)

From: PaweÅ Sikora
Date: Mon Sep 24 2012 - 01:24:50 EST


On Sunday 23 of September 2012 18:10:30 Linus Torvalds wrote:
> On Sat, Sep 22, 2012 at 11:09 PM, PaweÅ Sikora <pluto@xxxxxxxxxxxxx> wrote:
> >
> > br_read_lock(vfsmount_lock);
>
> The vfsmount_lock is a "local-global" lock, where a read-lock is
> rather cheap and takes just a per-cpu lock, but the downside is that a
> write-lock is *very* expensive, and can cause serious trouble.
>
> And the write lock is taken by the [un]mount() paths. Do *not* do
> crazy things. If you do some insane "unmount and remount autofs" on a
> 1s granularity, you're doing insane things.
>
> Why do you have that 1s timeout? Insane.

1s unmount timeout is *only* for fast bug reproduction (in few seconds after opteron startup)
and testing potential patches. normally with 60s timeout it happens in few minutes..hours
(depends on machine i/o+cpu load) and makes server unusable (permament soft-lockup).
can we redesign vserver's mnt_is_reachable() for better locking to avoid total soft-lockup?

BR,
PaweÅ.

ps).
i'm adding Herbert to CC.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/