Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)

From: Jay Cliburn
Date: Tue May 06 2008 - 12:02:46 EST


On Mon, 5 May 2008 01:15:07 +0400
Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:

> Looking at how other netdevice drivers:
>
> 8139too and others checks netif_running() in interrupt handler.
>
> r8169 has scary "50k$" question comment re irqs disabled after
> interacting with hardware.
>
> But the r8169 case should be fixed by atlx_irq_disable()?
>
> Writes to REG_IMR, REG_ISR are commented in atl1_reset_hw(), why?
> (I'll test that soon)

I've tried all the stuff you mentioned above, and more, to prevent the
memory corruption, all to no avail.

I booted with mem=4000M and didn't hit the bug. I diffed dmesg between
booting with mem=4000M and booting without it, and found that iommu
was being disabled when booting with full memory:

--- dmesg-4000.txt 2008-05-06 10:14:07.000000000 -0500
+++ dmesg-4096.txt 2008-05-06 10:09:19.000000000 -0500
@@ -1,5 +1,5 @@
Linux version 2.6.26-rc1 (jcliburn@xxxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20070
925 (Red Hat 4.1.2-27)) #4 SMP Mon May 5 18:03:48 CDT 2008
-Command line: ro root=LABEL=/1 console=ttyS0,38400 console=tty0 slub_debug=FZPU mem=4000M
+Command line: ro root=LABEL=/1 console=ttyS0,38400 console=tty0 slub_debug=FZPU
[...]
+Looks like a VIA chipset. Disabling IOMMU. Override with iommu=allowed
[...]

So I then booted with iommu=allowed. No errors. Can't hit the bug to
save my life.

Why would disabling iommu cause the atl1 driver to write over poisoned
memory?

Alexey, can you please try booting with iommu=allowed and see if you
avoid the problem?

Thanks,
Jay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/