Re: [PATCH] Limit VPD length on Atheros wifi cards

From: Bjorn Helgaas
Date: Fri Jan 08 2016 - 20:06:03 EST


[+cc Hannes, Michal, Shane, Myron, Venkat; sorry I missed you guys the
first time around]

On Fri, Jan 08, 2016 at 06:57:36PM -0600, Bjorn Helgaas wrote:
> [+cc Babu]
>
> Hi Jordan, Babu, et al,
>
> On Tue, Dec 29, 2015 at 04:19:02PM -0600, Jordan Hargrave wrote:
> > Attempt to read VPD on these cards causes kernel hang or delay
> >
> > Signed-off-by: Jordan Hargrave <Jordan_Hargrave@xxxxxxxx>
> > ---
> > drivers/pci/quirks.c | 10 ++++++++++
> > 1 files changed, 10 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index e6376f6..a72667f 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -2161,6 +2161,16 @@ static void quirk_brcm_570x_limit_vpd(struct pci_dev *dev)
> > }
> > }
> >
> > +static void quirk_atheros_vpd(struct pci_dev *dev)
> > +{
> > + /* Atheros WiFi cards hang when reading VPD */
> > + if (dev->vpd)
> > + dev->vpd->len = 0;
> > +}
> > +
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATHEROS, PCI_ANY_ID, quirk_atheros_vpd);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_atheros_vpd);
>
> We're accumulating several quirks like this, and they don't do
> anything device-specific, so we could probably use a single quirk for
> all of them, i.e., Babu's quirk_megaraid_sas_limit_vpd() and your
> quirk_atheros_vpd() are needlessly duplicated.
>
> I'd really like to have more details about what the "kernel hang or
> delay" that happens when we try to read VPD.
>
> Looking at pci_vpd_pci22_read(), it basically just does config
> accesses, so I only see two ways things could go wrong:
>
> 1) If the device responds with data, but the PCI_VPD_ADDR_F flag
> isn't set, we should be able to timeout gracefully.
>
> 2) If the device doesn't respond at all, we could see machine
> checks, master aborts, processor read timeouts, or similar hardware
> problems.
>
> If there are other failure modes, I'd like to learn what they are.
>
> If we're seeing problems like (1), there must be something we can
> improve in the way we handle timeouts.
>
> If we're seeing failures like (2), they're probably caused by a
> hardware or firmware problem on the device, and there's not much we
> can do in the kernel, so disabling VPD access completely is probably
> all we can do.
>
> So if/when we add quirks like this, I'm looking for bugzilla entries
> that clearly show that the problem is (2). I'd also like to log
> something in dmesg as a clue about why lspci or other utilities don't
> find any VPD data.
>
> Bjorn
>
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM,
> > PCI_DEVICE_ID_NX2_5706,
> > quirk_brcm_570x_limit_vpd);