Re: bad pmd filemap.c, oops; 2.4.30 and 2.4.32

From: Chris Stromsoe
Date: Wed Jan 04 2006 - 22:51:28 EST


On Sat, 31 Dec 2005, Willy Tarreau wrote:
On Sat, Dec 31, 2005 at 12:08:21PM +0000, Alan Cox wrote:
On Gwe, 2005-12-30 at 17:48 -0800, Chris Stromsoe wrote:
scsi0:0:0:0: Attempting to queue an ABORT message CDB: 0x12 0x0 0x0 0x0 0xff 0x0 scsi0:0:0:0: Command already completed aic7xxx_abort returns 0x2002

IRQ routing by the look of that trace. Make sure that if you are using 2.4.x you have ACPI disabled and see it looks any better

Correct, and I came to the same conclusion ; Chris told us he booted with the "nosmp" option. I've checked his config, and he has CONFIG_ACPI_BOOT=y. I've just tried the same here, and I confirm that my machine (dual athlon) does not boot with "nosmp" unless I also add "acpi=off". Mine even stops ealier, while scanning IDE devices.

2.6.14.4 has been running stable for 4 days. For the long term, I'll probably migrate the box to 2.6 and leave it there.

So now we're back to the original problem, i.e. why does he get bad pmd
that often on 2.4. It leaves us with the following possible next steps
after the problem occurs again (if it still happens with 2.6.14 or if
Chris is OK for a few more tests) :
- 2.4.32 nosmp acpi=off => the easiest one
- 2.4.32 + aic7xxx+20040522 => the more interesting one

I booted 2.4.32 with the aic7xxx patch you pointed me at last week. It's been up for a few hours. I'll let it run for at least a week or two and will report back positive or negative results. After that, I'll try 2.4.32 with nosmp and acpi=off.


-Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/