Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually

From: Linas Vepstas
Date: Wed Dec 20 2006 - 18:47:29 EST


On Thu, Dec 21, 2006 at 10:09:55AM +1100, Benjamin Herrenschmidt wrote:
>
> > This also doesn't explain the bizarre form of the failure on
> > power4 ... does power4 mishandle twi somehow? Will investigate
> > next.
>
> Or you hit it before you have a console ?

Ahh... I was about to claim I have a console, but on closer
examination: someone has been messing with consoles?
I don't get it ... The power4 has ony a serial connection.
the power5 has a graphics card in it, unused.

The power5 has "Console: colour dummy device 80x25"
kick in, at which point the time goes crazy as well.

There may be multiple bugs here.

The weird crash for 2.6.19-git7 on power4 looks like this:

Looking for displays
opening PHB /pci@xxxxxxxxxxxxxxx done
opening PHB /pci@xxxxxxxxxxxxxxx done
instantiating rtas at 0x000000002fd10000 ... done
0000000000000000 : boot cpu 0000000000000000
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00000000043a0000 -> 0x00000000043a135c
Device tree struct 0x00000000043b0000 -> 0x00000000043c0000
Calling quiesce ...
returning from prom_init
[ 0.000000] Starting Linux PPC64 #0 SMP Tue Dec 5 17:20:17 CST 2006
[ 0.000000] -----------------------------------------------------
[ 0.000000] ppc64_pft_size = 0x0
[ 0.000000] physicalMemorySize = 0x400000000
[ 0.000000] ppc64_caches.dcache_line_size = 0x80
[ 0.000000] ppc64_caches.icache_line_size = 0x80
[ 0.000000] htab_address = 0xc0000003e0000000
[ 0.000000] htab_hash_mask = 0x1fffff
[ 0.000000] -----------------------------------------------------
[ 0.000000] Linux version 2.6.19-git7linas (root@venn) (gcc version
4.1.0 (SUSE Linux)) #0 SMP Tue Dec 5 17:20:17 CST 2006
[ 0.000000] [boot]0012 Setup Arch
[ 0.000000] No ramdisk, default root is /dev/sda2
[ 0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[ 0.000000] PPC64 nvram contains 81920 bytes
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 262144
[ 0.000000] Normal 262144 -> 262144
[ 0.000000] early_node_map[1] active PFN ranges
[ 0.000000] 0: 0 -> 262144
[ 0.000000] [boot]0015 Setup Done
[ 0.000000] Built 1 zonelists. Total pages: 261888
[ 0.000000] Kernel command line: root=/dev/sda3 console=ttyS0
selinux=0 elevator=cfq kdb=on
[ 0.000000] [boot]0020 XICS Init
[ 0.000000] i8259 legacy interrupt controller initialized
[ 0.000000] [boot]0021 XICS Done
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
System assert at: file: rtas_io_config.c -- line: 195
rio_hub_num: 10
drawer_num: 6
phb_num: 3
buid: 7


which should be compared to 2.6.20-rc1-git6, build with the same
.config, on power4:

Looking for displays
opening PHB /pci@xxxxxxxxxxxxxxx done
opening PHB /pci@xxxxxxxxxxxxxxx done
instantiating rtas at 0x000000002fd10000 ... done
0000000000000000 : boot cpu 0000000000000000
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000004380000 -> 0x000000000438135c
Device tree struct 0x0000000004390000 -> 0x00000000043a0000
Calling quiesce ...
returning from prom_init
[ 0.000000] Starting Linux PPC64 #0 SMP Tue Dec 19 17:42:44 CST 2006
[ 0.000000] -----------------------------------------------------
[ 0.000000] ppc64_pft_size = 0x0
[ 0.000000] physicalMemorySize = 0x400000000
[ 0.000000] ppc64_caches.dcache_line_size = 0x80
[ 0.000000] ppc64_caches.icache_line_size = 0x80
[ 0.000000] htab_address = 0xc0000003e0000000
[ 0.000000] htab_hash_mask = 0x1fffff
[ 0.000000] -----------------------------------------------------
[ 0.000000] Linux version 2.6.20-rc1-git6linas (root@venn) (gcc
version 4.1.0 (SUSE Linux)) #0 SMP Tue Dec 19 17:42:44 CST 2006
[ 0.000000] [boot]0012 Setup Arch
[ 0.000000] No ramdisk, default root is /dev/sda2
[ 0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[ 0.000000] PPC64 nvram contains 81920 bytes
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 262144
[ 0.000000] Normal 262144 -> 262144
[ 0.000000] early_node_map[1] active PFN ranges
[ 0.000000] 0: 0 -> 262144
[ 0.000000] [boot]0015 Setup Done
[ 0.000000] Built 1 zonelists. Total pages: 261888
[ 0.000000] Kernel command line: root=/dev/sda3 console=ttyS0
selinux=0 elevator=cfq kdb=on
[ 0.000000] [boot]0020 XICS Init
[ 0.000000] i8259 legacy interrupt controller initialized
[ 0.000000] [boot]0021 XICS Done
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
sp: c0000000007a3c00
msr: 9000000000029032
current = 0xc000000000663690
paca = 0xc000000000663f80
pid = 0, comm = swapper
enter ? for help
[c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
[c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
[c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
[c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
[c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
[c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
0:mon>


And, exactly the same kernel, on power5:

Looking for displays
found display : /pci@800000020000002/pci@2,2/pci@1/display@0, opening
... done
instantiating rtas at 0x0000000007710000 ... done
0000000000000000 : boot cpu 0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000002580000 -> 0x000000000258141e
Device tree struct 0x0000000002590000 -> 0x00000000025a0000
[ 0.000000] Using pSeries machine description
[ 0.000000] Partition configured for 4 cpus.
[ 0.000000] Starting Linux PPC64 #0 SMP Tue Dec 19 17:42:44 CST 2006
[ 0.000000] -----------------------------------------------------
[ 0.000000] ppc64_pft_size = 0x19
[ 0.000000] physicalMemorySize = 0x72000000
[ 0.000000] ppc64_caches.dcache_line_size = 0x80
[ 0.000000] ppc64_caches.icache_line_size = 0x80
[ 0.000000] htab_address = 0x0000000000000000
[ 0.000000] htab_hash_mask = 0x3ffff
[ 0.000000] -----------------------------------------------------
[ 0.000000] Linux version 2.6.20-rc1-git6linas (root@venn) (gcc
version 4.1.0 (SUSE Linux)) #0 SMP Tue Dec 19 17:42:446[ 0.000000]
[boot]0012 Setup Arch
[ 0.000000] No ramdisk, default root is /dev/sda2
[ 0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[ 0.000000] PPC64 nvram contains 7168 bytes
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 29184
[ 0.000000] Normal 29184 -> 29184
[ 0.000000] early_node_map[1] active PFN ranges
[ 0.000000] 0: 0 -> 29184
[ 0.000000] [boot]0015 Setup Done
[ 0.000000] Built 1 zonelists. Total pages: 29156
[ 0.000000] Kernel command line: root=/dev/sda9 console=hvsi0 selinux=0 elevator=cfq rtas_msgs=1
[ 0.000000] [boot]0020 XICS Init
[ 0.000000] [boot]0021 XICS Done
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[97741.568069] Console: colour dummy device 80x25
[97742.318175] ------------------------
[97742.318182] | Locking API testsuite:
[97742.318189] ----------------------------------------------------------------------------
[97742.318200] | spin |wlock |rlock |mutex | wsem | rsem |
[97742.318211] --------------------------------------------------------------------------
[97742.318222] A-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318246] A-B-B-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318271] A-B-B-C-C-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318297] A-B-C-A-B-C deadlock:failed|failed| ok |failed|failed|failed|
[97742.318323] A-B-B-C-C-D-D-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318352] A-B-C-D-B-D-D-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318380] A-B-C-D-B-C-D-A deadlock:failed|failed| ok |failed|failed|failed|
[97742.318408] double unlock: ok | ok |failed|<0>------------[ cut here ]------------
[97742.318440] Kernel BUG at c00000000007d574 [verbose debug info
unavailable]
cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
sp: c0000000007a3c00
msr: 8000000000029032
current = 0xc000000000663690
paca = 0xc000000000663f80
pid = 0, comm = swapper
enter ? for help
[c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
[c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
[c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
[c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
[c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
[c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
0:mon> r

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/