Re: lockdep and debug objects together are broken?

From: Ingo Molnar
Date: Wed Jan 21 2009 - 06:42:49 EST



* Nick Piggin <npiggin@xxxxxxx> wrote:

> On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote:
> > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@xxxxxxx> wrote:
> > > Hi,
> > >
> > > I've had a problem frustrating my testing because lockdep was silently turning
> > > itself off... I patched out the code to disable lockdep after the first error,
> > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> > > migration thread was reported to try to unlock rq->lock although it was
> > > holding no locks. Then init was reported to return to userspace without
> > > releasing an objectdebug hash lock.
> > >
> > > All that went away and everything seemed to work properly with debug objects
> > > configured out.
> > >
> > > I didn't get too far in trying to debug the problem. But it should be easy
> > > enough to reproduce (if not, I can post traces or test patches).
> >
> > I just built a kernel with lockdep and debugobjects enabled, and
> > everything seemed fine. I think you should post your kernel version,
> > config, and the lockdep patch (if needed -- it didn't seem to turn
> > itself off here).
>
> Are you sure? Ie. sysrq+D a still works properly? In that case, you
> wouldn't need the lockdep patch because it just prevents the feature from being
> switched off.
>
> I'll have to dig a bit further, then. The annoying thing is that
> lockdep turns itself off at the drop of a hat (and this particular
> problem seems to happen without any backtraces), so it invalidates
> all your lockdep testing if you don't realise it has turned itself
> off.
>
> Is there a way to re-arm lockdep? That would be neat.

Not at the moment, and it looks somewhat complicated. All lock state
freezes the moment lockdep disarms itself. That's very much a key design
element: rarely will you see any real lockdep-inflicted crash - even if it
has a bug it is self-disabling itself and running for the door very
efficiently.

So by the time you'd rearm, there's a lot of tasks with no proper locking
state built up. We might be able to re-arm via stop_machine_run perhaps.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/