Re: [PATCH 1/1] PCI: dwc: Use regular interrupt instead of chained

From: Radu Rendec
Date: Thu Jun 29 2023 - 17:28:55 EST


On Thu, 2023-06-29 at 15:58 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 29, 2023 at 04:42:07PM -0400, Radu Rendec wrote:
> > On Thu, 2023-06-29 at 14:57 -0500, Bjorn Helgaas wrote:
> > > On Thu, Jun 29, 2023 at 02:30:19PM -0400, Radu Rendec wrote:
> > > > The DesignWare PCIe host driver uses a chained interrupt to demultiplex
> > > > the downstream MSI interrupts. On Qualcomm SA8540P Ride, enabling both
> > > > pcie2a and pcie3a at the same time can create an interrupt storm where
> > > > the parent interrupt fires continuously, even though reading the PCIe
> > > > host registers doesn't identify any child MSI interrupt source. This
> > > > effectively locks up CPU0, which spends all the time servicing these
> > > > interrupts.
> > > >
> > > > This is a clear example of how bypassing the interrupt core by using
> > > > chained interrupts can be very dangerous if the hardware misbehaves.
> > > >
> > > > Convert the driver to use a regular interrupt for the demultiplex
> > > > handler. This allows the interrupt storm detector to detect the faulty
> > > > interrupt and disable it, allowing the system to run normally.
> > >
> > > There are many other users of irq_set_chained_handler_and_data() in
> > > drivers/pci/controller/.  Should they be similarly converted?  If not,
> > > how do we decide which need to use irq_set_chained_handler_and_data()
> > > and which do not?
> >
> > According to Thomas Gleixner, yes. Obviously I don't want to put words
> > in his mouth, but I think that's the gist of what he said in a reply to
> > an RFC patch that I sent a few weeks ago:
> > https://lore.kernel.org/all/877csohcll.ffs@tglx/
>
> Is it's a bug in pcie-designware-host.c, and it's also a bug in the
> other callers, we should fix them all.

I wouldn't call it a bug, because the driver works as long as the
hardware behaves. But if the hardware misbehaves and generates an
interrupt storm for whatever reason, you end up with an apparently
frozen system and no clue as to what's going on. By contrast, using
regular interrupts allows the interrupt storm detector to kick in. Then
the system works fine, and there's also a useful log message pointing
to that particular interrupt.

> But you do imply that there's some DesignWare hardware issue involved,
> too, so I guess it's possible the other drivers don't have an issue
> and/or actually require the chained IRQs.  That's why I asked how we
> should decide.

That makes sense. I don't know if it's a DesignWare hardware issue or
something else down the PCIe bus. Other folks in my team are
investigating the root cause. This thread has the details:
https://lore.kernel.org/all/pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse/

I want to point out that this patch doesn't *fix* the root problem (the
interrupt storm). It just makes the kernel handle it (much) better if
it occurs.

I can't see why any driver would _require_ chained IRQs. As I see it,
chained interrupts are just a "shortcut" that circumvents the IRQ core
for (debatable) performance reasons. Other than that, they should work
the same as regular interrupts.

There are other benefits associated with using regular interrupts. One
of them is the ability to control the SMP affinity of the parent
interrupt. But that's a different topic.

Radu

> > > > -static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > > > +static void __dw_pcie_free_msi(struct dw_pcie_rp *pp, u32 num_ctrls)
> > > >  {
> > > >         u32 ctrl;
> > > >  
> > > > -       for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++) {
> > > > +       for (ctrl = 0; ctrl < num_ctrls; ctrl++) {
> > > >                 if (pp->msi_irq[ctrl] > 0)
> > > > -                       irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
> > > > -                                                        NULL, NULL);
> > > > +                       free_irq(pp->msi_irq[ctrl], pp);
> > > >         }
> > > >  
> > > >         irq_domain_remove(pp->msi_domain);
> > > >         irq_domain_remove(pp->irq_domain);
> > > >  }
> > > >  
> > > > +#define dw_pcie_free_msi(pp) __dw_pcie_free_msi(pp, MAX_MSI_CTRLS)
> > >
> > > What is the benefit of the dw_pcie_free_msi() macro?
> >
> > It allows me to add the num_ctrls parameter to the corresponding
> > function (now renamed to __dw_pcie_free_msi()) without forcing all the
> > existing call sites to send MAX_MSI_CTRLS explicitly.
> >
> > I needed that extra parameter to avoid duplicating the tear down code
> > on the (new) error path in dw_pcie_msi_init() - see below.
> >
> > > >  static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > > >  {
> > > >         struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > @@ -361,9 +353,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > >                 return ret;
> > > >  
> > > >         for (ctrl = 0; ctrl < num_ctrls; ctrl++) {
> > > > -               if (pp->msi_irq[ctrl] > 0)
> > > > -                       irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
> > > > -                                                   dw_chained_msi_isr, pp);
> > > > +               if (pp->msi_irq[ctrl] > 0) {
> > > > +                       ret = request_irq(pp->msi_irq[ctrl], dw_pcie_msi_isr, 0,
> > > > +                                         dev_name(dev), pp);
> > > > +                       if (ret) {
> > > > +                               dev_err(dev, "Failed to request irq %d: %d\n",
> > > > +                                       pp->msi_irq[ctrl], ret);
> > > > +                               __dw_pcie_free_msi(pp, ctrl);
> >
> > This is where I'm using the extra parameter. If we fail to request an
> > interrupt, we need to free all the other interrupts that we have
> > requested so far, to leave everything in a clean state. But we can't
> > use MAX_MSI_CTRLS with __dw_pcie_free_msi() and rely on the check there
> > because there may be extra interrupts that we haven't requested *yet*
> > and we would attempt to free them.
>
> Makes sense, thanks.
>
> Bjorn
>