Re: Regression: Boot hang sizing transparent PCI-to-PCIbridgesince after 2.6.25-r7.

From: Gary Hade
Date: Wed Nov 12 2008 - 17:11:36 EST


On Wed, Nov 12, 2008 at 09:50:14AM +0100, GARCIA DE SORIA LUCENA, JUAN JESUS wrote:
> > -----Original Message-----
> > From: Gary Hade [mailto:garyhade@xxxxxxxxxx]
> >
> > Correction. We were not trying to address boot-time hangs
> > but I believe we may have been trying to address expansion
> > ROM related PCI resource allocation failures that we were
> > seeing during boot with certain PCI cards. However, I doubt
> > that this is relevent to your problem. The transparent
> > bridge sizing removal change is probably a red herring.
>
> After reading a little bit on how PCI and PCI-to-PCI bridges work, and
> how they're handled in linux (http://tldp.org/LDP/tlk/dd/pci.html), I
> now know that ranges in the bridge (either I/O, mmio or prefetchable
> mem) are simply disabled when start < end, and that's the original
> configuration that the BIOS enforces when the bridge is not sized by
> Linux.
>
> After inserting kprintf()'s, I see that the hang happens actually while
> the positive decoding of the I/O range in the bridge is being activated
> in pci_setup_bridge(), sometime in between the writes to the I/O
> base/limit registers of the bridge; I don't remember exactly which was
> the last pci_Write_config_dword() that allowed the next kprintf() to
> succeed. I'll look at it tonight again, but I suspect that the final
> enabling write (the one that updates the PCI_IO_BASE_UPPER16 register
> with its final value) was the one hanging the machine.
>
> The I/O range being activated was the one in the range 0x1000-0x1fff,
> apparently correctly sized to accomodate the two I/O ranges
> (0x1000-0x10ff, 0x1400-0x14ff) assigned to the CardBus bridge on the
> secondary bus.
>
> One theory is that my system has something actually mapped to that I/O
> range in the root PCI bus. When only subtractive decoding is in place
> (the I/O range isn't activated), access to the secondary bus behind the
> PCI-to-PCI bridge is done when the transaction isn't claimed by any
> device in the root bus, after what the PCI docs describe as a 4-cycle
> timeout. When the I/O range is activated, that range is positively
> decoded by the bridge, which tries to claim the transaction before the
> timeout. Perhaps two devices (the bridge and the unknown device on the
> root bus) conflict when claiming the same transaction?
>
> Another possibility could be that activating the I/O range disables the
> negative decoding in the secondary-to-primary sense of the bridge for
> that I/O range. Perhaps some device behind the bridge depends on being
> able to forward transactions to the primary bus on that I/O range, but
> it's disallowed after the range is configured. For me this seems rather
> unlikely, because of the nature of the devices behind the bridge.
>
> I'll look at it more closely again, and I will test whether commenting
> out the I/O range sizing (leaving the other ranges to be sized) is
> enough to allow the system to run. If so, is there any way to use a
> system-specific quirk in order to remove the PCI-to-PCI bridge I/O range
> from being sized/activated?

Yes, there are already examples of this sort of thing in the PCI core.
See drivers/pci/quirks.c. Of course, you will likely have to present
a very good argument as to why such a change is really necessary.

BTW, I just noticed that you have not been copying linux-pci@xxxxxxxxxxxxxxx
or the PCI maintainer Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> where you
will probably get more seasoned advise.

Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
garyhade@xxxxxxxxxx
http://www.ibm.com/linux/ltc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/