Debugging help

From: Arador (diegocg@teleline.es)
Date: Wed Mar 19 2003 - 08:10:33 EST


Hi, i've a _very_ strange hang in 2.5 and i'd like to hear some advices,
as i don't know how to get any clue of what is happening to help to fix
it.

The hangup has been happening in 2.5 from ages (say 20-30 releases).
I'm sorry i've not reported it before.

I've not had any hangs under 2.4 that i remember of; so i discard
hardware problem. The machine also passes memtest86.
(details of the machine below).

The hang happens with and without X running. The system just freezes.
There isn't oops or messages (checked without X).

Sysrq doesn't work. There aren't cool blinking keyboard leds either :)
When it happens; i can hear how the fan speeds up a bit; this only
happens when the cpu usage is near 100%.

But it happens a strange thing when i'm under X. The system is hanged
(sysrq not working). But i can see how the screen refresh. Not processes;
but mouse pointer. If i move the mouse in that situation, i see how the
mouse answers; it takes ~5 seconds to refresh but it does. Also, if
the pointer is near the border of a window, the pointer changes to a
arrow; like when you are going to resize it. I've seen this happening
with and without DRI enabled.

The hangs are 100% moon dependant. I can have 5 in 15 minutes or zero in
a day. I've noticed that before the last big -ac ide merge it used to
happen a lot more (so i blamed some ide bug that is triggered much less by the
-ac code).

Machine doesn't answer to pings. ssh sessions into the machine die.

Machine is SMP P3 2x800; 256 MB ram; elitegroup D6VAA motherboard, latest
BIOS revision, .config attached (it also happens with and without preempt).
Filesystem is ext3; debian sid environment.

estel:~# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586/B/686A/B PIPC Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 16)
00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 16)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50)
00:09.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3 (rev 01)
00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)

typical modules loaded:
estel:/tmp# lsmod
Module Size Used by
af_packet 20232 0 [unsafe]
ppp_deflate 5344 0 [unsafe]
zlib_deflate 21944 1 ppp_deflate
zlib_inflate 21408 1 ppp_deflate
bsd_comp 5632 0 [unsafe]
ppp_async 10688 0 [unsafe]
ppp_generic 29000 3 ppp_deflate,bsd_comp,ppp_async
slhc 6048 1 ppp_generic
tdfx 44024 1
ide_cd 37568 0
cdrom 32992 1 ide_cd
lp 9860 0
3c59x 33000 1

Linux version 2.5.65 (diego@estel) (gcc versión 3.2.3 20030309 (Debian prerelease)) #5 SMP mar mar 18 19:15:13 CET 2003
Video mode to be used for restore is 30a
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
 BIOS-e820: 000000000fff0000 - 000000000fff3000 (ACPI NVS)
 BIOS-e820: 000000000fff3000 - 0000000010000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
255MB LOWMEM available.
found SMP MP-table at 000f64e0
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
hm, page 000f1000 reserved twice.
hm, page 000f2000 reserved twice.
On node 0 totalpages: 65520
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 61424 pages, LIFO batch:14
  HighMem zone: 0 pages, LIFO batch:1
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 6:8 APIC version 17
Processor #1 6:8 APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode: Flat. Using 1 I/O APICs
Processors: 2
Building zonelist for node : 0
Kernel command line: root=/dev/hda5 ro vga=0x30a profile=2 nmi_watchdog=1
kernel profiling enabled
Initializing CPU#0
PID hash table entries: 1024 (order 10: 8192 bytes)
Detected 802.853 MHz processor.
Console: colour VGA+ 132x43
Calibrating delay loop... 1581.05 BogoMIPS
Memory: 254184k/262080k available (1843k kernel code, 7168k reserved, 497k data, 160k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
-> /dev
-> /dev/console
-> /root
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
CPU: After generic, caps: 0383fbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
CPU0: Intel Pentium III (Coppermine) stepping 06
per-CPU timeslice cutoff: 732.17 usecs.
task migration cache decay timeout: 1 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/1 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 1601.53 BogoMIPS
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
CPU: After generic, caps: 0383fbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium III (Coppermine) stepping 06
Total of 2 processors activated (3182.59 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 19.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00178011
....... : max redirection entries: 0017
....... : PRQ implemented: 1
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00 1 0 0 0 0 0 0 00
 01 001 01 0 0 0 0 0 1 1 39
 02 001 01 0 0 0 0 0 1 1 31
 03 001 01 0 0 0 0 0 1 1 41
 04 001 01 0 0 0 0 0 1 1 49
 05 001 01 0 0 0 0 0 1 1 51
 06 001 01 0 0 0 0 0 1 1 59
 07 001 01 0 0 0 0 0 1 1 61
 08 001 01 0 0 0 0 0 1 1 69
 09 001 01 0 0 0 0 0 1 1 71
 0a 001 01 1 1 0 1 0 1 1 79
 0b 001 01 1 1 0 1 0 1 1 81
 0c 001 01 1 1 0 1 0 1 1 89
 0d 001 01 0 0 0 0 0 1 1 91
 0e 001 01 0 0 0 0 0 1 1 99
 0f 001 01 0 0 0 0 0 1 1 A1
 10 000 00 1 0 0 0 0 0 0 00
 11 000 00 1 0 0 0 0 0 0 00
 12 000 00 1 0 0 0 0 0 0 00
 13 000 00 1 0 0 0 0 0 0 00
 14 000 00 1 0 0 0 0 0 0 00
 15 000 00 1 0 0 0 0 0 0 00
 16 000 00 1 0 0 0 0 0 0 00
 17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 802.0824 MHz.
..... host bus clock speed is 133.0803 MHz.
checking TSC synchronization across 2 CPUs: passed.
Starting migration thread for cpu 0
Bringing up 1
CPU 1 IS NOW UP!
Starting migration thread for cpu 1
CPUS done 2
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
mtrr: v2.0 (20020519)
mtrr: your CPUs had inconsistent variable MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
PCI: PCI BIOS revision 2.10 entry at 0xfb250, last bus=1
PCI: Using configuration type 1
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]: 1 bvecs: 256 entries (12 bytes)
biovec pool[1]: 4 bvecs: 256 entries (48 bytes)
biovec pool[2]: 16 bvecs: 256 entries (192 bytes)
biovec pool[3]: 64 bvecs: 256 entries (768 bytes)
biovec pool[4]: 128 bvecs: 256 entries (1536 bytes)
biovec pool[5]: 256 bvecs: 256 entries (3072 bytes)
Linux Plug and Play Support v0.95 (c) Adam Belay
pnp: the driver 'system' has been registered
PnPBIOS: Found PnP BIOS installation structure at 0xc00fbcc0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xbcf0, dseg 0xf0000
pnp: match found with the PnP device '00:07' and the driver 'system'
pnp: match found with the PnP device '00:08' and the driver 'system'
PnPBIOS: 14 nodes reported by PnP BIOS; 14 recorded by driver
block request queues:
 128 requests per read queue
 128 requests per write queue
 8 requests per batch
 enter congestion at 15
 exit congestion at 17
drivers/usb/core/usb.c: registered new driver usbfs
drivers/usb/core/usb.c: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router VIA [1106/0686] at 00:07.0
PCI->APIC IRQ transform: (B0,I7,P3) -> 12
PCI->APIC IRQ transform: (B0,I7,P3) -> 12
PCI->APIC IRQ transform: (B0,I7,P2) -> 10
PCI->APIC IRQ transform: (B0,I9,P0) -> 11
PCI->APIC IRQ transform: (B0,I13,P0) -> 11
IA-32 Microcode Update Driver: v1.11 <tigran@veritas.com>
Starting balanced_irq
Enabling SEP on CPU 1
Enabling SEP on CPU 0
Journalled Block Device driver loaded
PCI: Enabling Via external APIC routing
pty: 256 Unix98 ptys configured
Real Time Clock Driver v1.11
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
pnp: the driver 'serial' has been registered
pnp: match found with the PnP device '00:0d' and the driver 'serial'
pnp: match found with the PnP device '00:0e' and the driver 'serial'
pnp: the driver 'parport_pc' has been registered
pnp: match found with the PnP device '00:10' and the driver 'parport_pc'
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE]
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport_pc: Via 686A parallel port: io=0x378
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1
    ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 6Y060L0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: LG CD-ROM CRD-8522B, ATAPI CD/DVD-ROM drive
hdd: ST340016A, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: host protected area => 1
hda: 120103200 sectors (61493 MB) w/2048KiB Cache, CHS=119150/16/63, UDMA(100)
 hda: hda1 < hda5 hda6 > hda2 hda3 hda4
hdd: host protected area => 1
hdd: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=77545/16/63, UDMA(33)
 hdd: hdd1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
Advanced Linux Sound Architecture Driver Version 0.9.0rc7 (Sat Feb 15 15:01:21 2003 UTC).
request_module: failed /sbin/modprobe -- snd-card-0. error = -16
no UART detected at 0xffff
ALSA sound/drivers/mpu401/mpu401.c:76: specify port
PCI: Setting latency timer of device 00:07.5 to 64
ALSA device list:
  #0: VIA 82C686A/B rev50 at 0xcc00, irq 10
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 1024 buckets, 16Kbytes
TCP: Hash tables configured (established 8192 bind 10922)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 160k freed
Adding 530104k swap on /dev/hda6. Priority:-1 extents:1
EXT3 FS 2.4-0.9.16, 02 Dec 2001 on ide0(3,5), internal journal
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
00:0d.0: 3Com PCI 3c905C Tornado at 0xe000. Vers LK1.1.19
lp0: using parport0 (polling).
lp0: console ready
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
hdc: drive_cmd: status=0x01 { Error }
hdc: drive_cmd: error=0x04Aborted Command
[drm] Initialized tdfx 1.0.0 20010216 on minor 0
CSLIP: code copyright 1989 Regents of the University of California
PPP generic driver version 2.4.2
Module ppp_async cannot be unloaded due to unsafe usage in include/linux/module.h:457
PPP BSD Compression module registered
Module bsd_comp cannot be unloaded due to unsafe usage in include/linux/module.h:457
PPP Deflate Compression module registered
Module ppp_deflate cannot be unloaded due to unsafe usage in include/linux/module.h:457
Module af_packet cannot be unloaded due to unsafe usage in include/linux/module.h:457
eth0: Setting half-duplex based on MII #24 link partner capability of 0000.
eth0: Setting full-duplex based on MII #24 link partner capability of 01e1.
ALSA sound/pci/via82xx.c:676: invalid via82xx_cur_ptr, using last valid pointer
ALSA sound/core/pcm_lib.c:156: Unexpected hw_pointer value (stream = 0, delta: -64, max jitter = 4064): wrong interrupt acknowledge?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:26 EST