Re: 3.1-rc4: spectacular kernel errors / filesystem crash

From: Jon Mason
Date: Tue Sep 13 2011 - 11:51:08 EST


On Tue, Sep 13, 2011 at 10:42 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
>
>
> On Tue, 13 Sep 2011, Jon Mason wrote:
>
>> On Tue, Sep 13, 2011 at 9:54 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>
>> wrote:
>>>
>>>
>>> On Tue, 13 Sep 2011, Eric Dumazet wrote:
>>>
>>>> Please Justin make sure you pulled commit
>>>> commit ed2888e906b56769b4ffabb9c577190438aa68b8
>>>> Author: Jon Mason <mason@xxxxxxxx>
>>>> Date:   Thu Sep 8 16:41:18 2011 -0500
>>>>
>>>>   PCI: Remove MRRS modification from MPS setting code
>>>>
>>>>   Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
>>>>   massive negative ramifications on some devices.  Without knowing which
>>>>   devices have this issue, do not modify from the default value when
>>>>   walking the PCI-E bus in pcie_bus_safe mode.  Also, make pcie_bus_safe
>>>>   the default procedure.
>>>>
>>>>   Tested-by: Sven Schnelle <svens@xxxxxxxxxxxxxx>
>>>>   Tested-by: Simon Kirby <sim@xxxxxxxxxx>
>>>>   Tested-by: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx>
>>>>   Reported-and-tested-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
>>>>   Reported-and-tested-by: Niels Ole Salscheider
>>>> <niels_ole@salscheider-online.
>>>>   References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
>>>>   Signed-off-by: Jon Mason <mason@xxxxxxxx>
>>>>   Acked-by: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx>
>>>>   Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>>
>>> Hello,
>>>
>>> I found this commit here:
>>> http://permalink.gmane.org/gmane.linux.kernel.pci/11700
>>
>> This is an early version of the patch.  This is the patch that you want:
>>
>> https://github.com/torvalds/linux/commit/ed2888e906b56769b4ffabb9c577190438aa68b8
>>
>> It appears that this patch didn't make it to lkml or linux-pci list
>> due to kernel.org DNS being down when it was sent.
>>
>> Thanks,
>> Jon
>
> I need to learn how to use git at some point, can you please provide plain
> text patches so I can apply them and reboot?
>
> Justin.

I've attached the 2 patches I asked Linus to include into 3.1-rc6.
Let me know if there are any issues.

Thanks,
Jon
From cf822aed99fd8851d82ae5f2df11c29b79e316c8 Mon Sep 17 00:00:00 2001
From: Shyam Iyer <shyam.iyer.t@xxxxxxxxx>
Date: Wed, 31 Aug 2011 12:21:42 -0400
Subject: [PATCH 1/2] Fix pointer dereference before call to
pcie_bus_configure_settings

There is a potential NULL pointer dereference in calls to
pcie_bus_configure_settings due to attempts to access pci_bus self
variables when the self pointer is NULL. To correct this, verify that
the self pointer in pci_bus is non-NULL before dereferencing it.

Reported-by: Stanislaw Gruszka <sgruszka@xxxxxxxxxx>
Signed-off-by: Shyam Iyer <shyam_iyer@xxxxxxxx>
Signed-off-by: Jon Mason <mason@xxxxxxxx>
---
arch/x86/pci/acpi.c | 9 +++++++--
drivers/pci/hotplug/pcihp_slot.c | 4 +++-
drivers/pci/probe.c | 3 ---
3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index c953302..039d913 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -365,8 +365,13 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
*/
if (bus) {
struct pci_bus *child;
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child, child->self->pcie_mpss);
+ list_for_each_entry(child, &bus->children, node) {
+ struct pci_dev *self = child->self;
+ if (!self)
+ continue;
+
+ pcie_bus_configure_settings(child, self->pcie_mpss);
+ }
}

if (!bus)
diff --git a/drivers/pci/hotplug/pcihp_slot.c b/drivers/pci/hotplug/pcihp_slot.c
index 753b21a..3ffd9c1 100644
--- a/drivers/pci/hotplug/pcihp_slot.c
+++ b/drivers/pci/hotplug/pcihp_slot.c
@@ -169,7 +169,9 @@ void pci_configure_slot(struct pci_dev *dev)
(dev->class >> 8) == PCI_CLASS_BRIDGE_PCI)))
return;

- pcie_bus_configure_settings(dev->bus, dev->bus->self->pcie_mpss);
+ if (dev->bus && dev->bus->self)
+ pcie_bus_configure_settings(dev->bus,
+ dev->bus->self->pcie_mpss);

memset(&hpp, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, &hpp);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8473727..0820fc1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1456,9 +1456,6 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
{
u8 smpss = mpss;

- if (!bus->self)
- return;
-
if (!pci_is_pcie(bus->self))
return;

--
1.7.6

From 74d81235f8e4bd60859d539a27e51d3a09d183cf Mon Sep 17 00:00:00 2001
From: Jon Mason <mason@xxxxxxxx>
Date: Thu, 8 Sep 2011 12:59:00 -0500
Subject: [PATCH 2/2] PCI: Remove MRRS modification from MPS setting code

Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
massive negative ramifications on some devices. Without knowing which
devices have this issue, do not modify from the default value when
walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
the default procedure.

Tested-by: Sven Schnelle <svens@xxxxxxxxxxxxxx>
Tested-by: Simon Kirby <sim@xxxxxxxxxx>
Tested-by: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@xxxxxxxxxxxxxxxxxxxxx>
References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
Signed-off-by: Jon Mason <mason@xxxxxxxx>
---
drivers/pci/pci.c | 2 +-
drivers/pci/probe.c | 41 ++++++++++++++++++++++-------------------
2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..4e84fd4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE;
unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;

-enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE;
+enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE;

/*
* The default CLS is used if arch didn't set CLS explicitly and not
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0820fc1..b1187ff 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1396,34 +1396,37 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)

static void pcie_write_mrrs(struct pci_dev *dev, int mps)
{
- int rc, mrrs;
+ int rc, mrrs, dev_mpss;

- if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
- int dev_mpss = 128 << dev->pcie_mpss;
+ /* In the "safe" case, do not configure the MRRS. There appear to be
+ * issues with setting MRRS to 0 on a number of devices.
+ */

- /* For Max performance, the MRRS must be set to the largest
- * supported value. However, it cannot be configured larger
- * than the MPS the device or the bus can support. This assumes
- * that the largest MRRS available on the device cannot be
- * smaller than the device MPSS.
- */
- mrrs = mps < dev_mpss ? mps : dev_mpss;
- } else
- /* In the "safe" case, configure the MRRS for fairness on the
- * bus by making all devices have the same size
- */
- mrrs = mps;
+ if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
+ return;
+
+ dev_mpss = 128 << dev->pcie_mpss;

+ /* For Max performance, the MRRS must be set to the largest supported
+ * value. However, it cannot be configured larger than the MPS the
+ * device or the bus can support. This assumes that the largest MRRS
+ * available on the device cannot be smaller than the device MPSS.
+ */
+ mrrs = min(mps, dev_mpss);

/* MRRS is a R/W register. Invalid values can be written, but a
- * subsiquent read will verify if the value is acceptable or not.
+ * subsequent read will verify if the value is acceptable or not.
* If the MRRS value provided is not acceptable (e.g., too large),
* shrink the value until it is acceptable to the HW.
*/
while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) {
+ dev_warn(&dev->dev, "Attempting to modify the PCI-E MRRS value"
+ " to %d. If any issues are encountered, please try "
+ "running with pci=pcie_bus_safe\n", mrrs);
rc = pcie_set_readrq(dev, mrrs);
if (rc)
- dev_err(&dev->dev, "Failed attempting to set the MRRS\n");
+ dev_err(&dev->dev,
+ "Failed attempting to set the MRRS\n");

mrrs /= 2;
}
@@ -1436,13 +1439,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
if (!pci_is_pcie(dev))
return 0;

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

pcie_write_mps(dev, mps);
pcie_write_mrrs(dev, mps);

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

return 0;
--
1.7.6