Re: 2.6.13-rc6-rt6

From: Ingo Molnar
Date: Wed Aug 17 2005 - 11:23:05 EST



* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Wed, 2005-08-17 at 10:24 -0400, Steven Rostedt wrote:
>
> > OK the output from netconsole still seems like netconsole itself is
> > causing some problems. But I think it is also showing this lockup. I'll
> > recompile my kernel as UP and see if netconsole works fine.
>
> Well, the UP kernel boots on my laptop, but netconsole gives strange
> warnings.
>
> OK, what's the scoop with the illegal_API_call? What is it about, and
> what is the expected work around?

this is a recent change: i've started flagging "naked" use of
local_irq_disable(), because it's a problem on PREEMPT_RT and it's a
potential SMP bug on upstream kernels. A local_irq_disable() is
converted either to raw_local_irq_disable() when justified (it's mostly
only justified for lowlevel arch code), or is eliminated totally.
(either by merging it into a nearby spin_lock API call, or by removing
it altogether, making sure that code doesnt break).

Right now we print a warning on the first such API use, and then shut up
about it. All local_irq_() APIs map to NOPs. (we keep the PF_IRQSOFF
flag for compatibility, but only to get irqs_off() right which in turn
shuts off a number of BUG_ON(!irqs_disabled()) warnings, and it doesnt
have any other functional purpose.)

the desired end-result would be the total elimination of local_irq_*()
API calls.

> I'm also getting the following output on shutdown:
>
> NET: Registered protocol family 31
> Bluetooth: HCI device and connection manager initialized
> Bluetooth: HCI socket layer initialized
> Bluetooth: L2CAP ver 2.7
> Bluetooth: L2CAP socket layer initialized
> Bluetooth: RFCOMM ver 1.5
> Bluetooth: RFCOMM socket layer initialized
> Bluetooth: RFCOMM TTY layer initialized
> BUG: nonzero lock count 1 at exit time?
> nfsd: 4696 [f7183830, 115]
> [<c0136922>] check_no_held_locks+0x62/0x330 (8)
> [<c011df67>] do_exit+0x257/0x480 (32)
> [<c013d052>] __module_put_and_exit+0x52/0x70 (40)
> [<f8d54583>] nfsd+0x2b3/0x340 [nfsd] (12)
> [<f8d542d0>] nfsd+0x0/0x340 [nfsd] (48)
> [<c010140d>] kernel_thread_helper+0x5/0x18 (16)
> ---------------------------
> | preempt count: 00000000 ]
> | 0-level deep critical section nesting:
> ----------------------------------------
>
> ------------------------------
> | showing all locks held by: | (nfsd/4696 [f7183830, 116]):
> ------------------------------
>
> #001: [c038e184] {kernel_sem.lock}
> ... acquired at: lock_kernel+0x21/0x40
>
> BUG: nfsd/4696, BKL held at task exit time!

hm, it seems nfsd forgets to do an unlock_kernel() in some exit path it
seems? We are enforcing strict balanced lock use in PREEMPT_RT - the
upstream kernel is more relaxed about it.

> And it goes on and on. This happens everytime. Without netconsole, I
> only get the nonzero lock count error. Also, one of my lockups on SMP
> had to do with the kernel_thread_helper:
>
> Using IPI Shortcut mode
> khelper/794[CPU#0]: BUG in set_new_owner at kernel/rt.c:916

this is a 'must not happen'. Somehow lock->held list got non-empty.
Maybe some use-after-free thing? Havent seen it myself.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/