Re: 2.6.25-rc8-mm1: Intel SATA boot failure

From: Tilman Schmidt
Date: Wed Apr 09 2008 - 10:31:30 EST


This is taking longer than I hoped, so here's a little progress report.

On Thu, 3 Apr 2008 16:17:33 -0700, Andrew Morton wrote:
On Fri, 04 Apr 2008 01:08:19 +0200
Tilman Schmidt <tilman@xxxxxxx> wrote:

This fails to come up on my development machine, apparently because it
has trouble accessing the SATA hard disks.
Hardware: Intel Pentium D940, Intel DQ965GF board, two SATA hard disks.
Some unusual things I noticed during the boot process:

- a message "doing fast boot" that looked unfamiliar; unfortunately
it scrolled off too quickly to note its context

That message doesn't make it into dmesg. It's apparently a Suse thing,
and perhaps normal, so let's ignore that for now.

- for each of the two SATA ports in use, a message
"SATA port is slow to respond, please be patient"
accompanied by about 10 secs wait

These messages seem to be a separate issue. I also get them with
a .config that otherwise brings up the system successfully. That
allowed me to capture a dmesg, so here are some possibly interesting
hunks of the diff between a mainline kernel and a working 2.6.25-rc8-mm1
one:

--- dmesg-2.6.25-rc8-git.nots-reordered 2008-04-09 15:29:52.000000000 +0200
+++ dmesg-2.6.25-rc8-mm1.nots 2008-04-09 00:48:42.000000000 +0200
@@ -1,4 +1,4 @@
- Linux version 2.6.25-rc8-testing-00210-g51ac03f (ts@xenon) (gcc version 4.2.1 (SUSE Linux)) #37 SMP PREEMPT Wed Apr 9 01:27:07 CEST 2008
+ Linux version 2.6.25-rc8-mm1-testing (ts@xenon) (gcc version 4.2.1 (SUSE Linux)) #6 SMP PREEMPT Wed Apr 9 00:24:23 CEST 2008
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)

[...]

@@ -244,12 +277,10 @@
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) D CPU 3.20GHz stepping 04
- Total of 2 processors activated (12796.06 BogoMIPS).
+ Total of 2 processors activated (12796.87 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
- checking TSC synchronization [CPU#0 -> CPU#1]:
- Measured 560 cycles TSC warp between CPUs, turning off TSC clock.
- Marking TSC unstable due to: check_tsc_sync_source failed.
+ checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
CPU0 attaching sched-domain:
domain 0: span 03

[Nice - at last a kernel that likes my TSC; not sure if it matters though.]

@@ -846,26 +880,36 @@
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi0 : ahci
PM: Adding info for No Bus:host0
+ PM: Adding info for No Bus:host0
scsi1 : ahci
PM: Adding info for No Bus:host1
+ PM: Adding info for No Bus:host1
scsi2 : ahci
PM: Adding info for No Bus:host2
+ PM: Adding info for No Bus:host2
scsi3 : ahci
PM: Adding info for No Bus:host3
+ PM: Adding info for No Bus:host3
scsi4 : ahci
PM: Adding info for No Bus:host4
+ PM: Adding info for No Bus:host4
scsi5 : ahci
PM: Adding info for No Bus:host5
+ PM: Adding info for No Bus:host5
ata1: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25100 irq 217
ata2: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25180 irq 217
ata3: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25200 irq 217
ata4: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25280 irq 217
ata5: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25300 irq 217
ata6: SATA max UDMA/133 abar m2048@0x92c25000 port 0x92c25380 irq 217
+ ata1: port is slow to respond, please be patient (Status 0x80)
+ ata1: COMRESET failed (errno=-16)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: ST380811AS, 3.AAE, max UDMA/133
ata1.00: 156301488 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
+ ata2: port is slow to respond, please be patient (Status 0x80)
+ ata2: COMRESET failed (errno=-16)
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7: ST380811AS, 3.AAE, max UDMA/133
ata2.00: 156301488 sectors, multi 0: LBA48 NCQ (depth 31/32)
@@ -877,15 +921,22 @@
PM: Adding info for No Bus:target0:0:0
scsi 0:0:0:0: Direct-Access ATA ST380811AS 3.AA PQ: 0 ANSI: 5
PM: Adding info for scsi:0:0:0:0
+ PM: Adding info for No Bus:0:0:0:0
+ PM: Adding info for No Bus:0:0:0:0
PM: Adding info for No Bus:target1:0:0
scsi 1:0:0:0: Direct-Access ATA ST380811AS 3.AA PQ: 0 ANSI: 5
PM: Adding info for scsi:1:0:0:0
+ PM: Adding info for No Bus:1:0:0:0
+ PM: Adding info for No Bus:1:0:0:0
+ modprobe used greatest stack depth: 2200 bytes left
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:02:00.0 to 64
scsi6 : pata_marvell
PM: Adding info for No Bus:host6
+ PM: Adding info for No Bus:host6
scsi7 : pata_marvell
PM: Adding info for No Bus:host7
+ PM: Adding info for No Bus:host7
ata7: PATA max UDMA/100 cmd 0x2018 ctl 0x2024 bmdma 0x2000 irq 17
ata8: DUMMY
BAR5:00:00 01:7F 02:22 03:CA 04:00 05:00 06:00 07:00 08:00 09:00 0A:00 0B:00 0C:01 0D:00 0E:00 0F:00

After that, it quickly diverges because of major differences in the
ordering of device startup. Already, to get the diffs above I had to
shift a few blocks of lines around.

Before I try booting that kernel again, any instructions on what to
watch out for? Is netconsole usable again?

Yes, netconsole is usable again ;)

Good to know. I'm now looking in my netconsole setup for the reason
why I don't receive anything from it.

Thanks,
Tilman

--
Tilman Schmidt E-Mail: tilman@xxxxxxx
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)

Attachment: signature.asc
Description: OpenPGP digital signature