Re: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller

From: Bjorn Helgaas
Date: Fri Mar 08 2013 - 12:01:35 EST


On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@xxxxxxxxxxx> wrote:
> Hi, Bjorn
>
>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation. We've already read the BARs by the time
>> >> header quirks are run, so apparently it's not just the mere act of
>> >> accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here. For example, do BARs
>> >> 0-4 exist? Does the device decode accesses to the regions described
>> >> by the BARs? The PCI core has to know what resources the device uses,
>> >> so if the device decodes accesses, we can't just throw away the
>> >> start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm
>> can trigger it. I don't want to just paper over the problem until we
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang? What about "setpci -s<dev>
>> BASE_ADDRESS_0"? "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.
> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.

Ah, so the problem is not with accessing the BAR in config space. The
problem is with accessing the I/O port space mapped by the BAR. Is
that right?

Does "udevadm info --attribute-walk" really access the device address
space mapped by the BARs? That seems surprising to me, and I don't
see any indication of it when I try it on an AHCI device on my system:

# udevadm info --attribute-walk --path=/sys/devices/pci0000:00/0000:00:1f.2

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

looking at device '/devices/pci0000:00/0000:00:1f.2':
KERNEL=="0000:00:1f.2"
SUBSYSTEM=="pci"
DRIVER=="ahci"
ATTR{irq}=="40"
ATTR{subsystem_vendor}=="0x17aa"
ATTR{broken_parity_status}=="0"
ATTR{class}=="0x010601"
ATTR{consistent_dma_mask_bits}=="64"
ATTR{dma_mask_bits}=="64"
ATTR{local_cpus}=="00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f"
ATTR{device}=="0x3b2f"
ATTR{enable}=="1"
ATTR{msi_bus}==""
ATTR{local_cpulist}=="0-3"
ATTR{vendor}=="0x8086"
ATTR{subsystem_device}=="0x2168"
ATTR{numa_node}=="-1"

looking at parent device '/devices/pci0000:00':
KERNELS=="pci0000:00"
SUBSYSTEMS==""
DRIVERS==""

>> >> > ---
>> >> > drivers/pci/quirks.c | 15 +++++++++++++++
>> >> > 1 files changed, 15 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> >> > index 0369fb6..d49f8dc 100644
>> >> > --- a/drivers/pci/quirks.c
>> >> > +++ b/drivers/pci/quirks.c
>> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >> > DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >> > PCI_CLASS_BRIDGE_HOST, 8,
>> >> quirk_mmio_always_on);
>> >> >
>> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> >> > +* by IO resource file, and need to skip the files
>> >> > +*/
>> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> >> > +{
>> >> > + int i;
>> >> > +
>> >> > + for (i = 0; i < 5; i++)
>> >> > + if (dev->resource[i].start)
>> >> > + dev->resource[i].start =
>> >> > + dev->resource[i].end = 0;
>> >> > +}
>> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> >> > + quirk_marvell_mask_bar);
>> >> > +
>> >> > /* The Mellanox Tavor device gives false positive parity errors
>> >> > * Mark this device with a broken_parity_status, to allow
>> >> > * PCI scanning code to "skip" this now blacklisted device.
>> >> > --
>> >> > 1.7.5.4
>> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/