Re: Why set .suppress_bind_attrs even though .remove() implemented?

From: Marc Zyngier
Date: Mon Jul 25 2022 - 10:43:49 EST


On Mon, 25 Jul 2022 14:25:49 +0100,
Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> [ +CC: maz ]
>
> On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> >
> > > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > > calls irq_dispose_mapping().
> > > >
> > > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > > should drop the "make qcom modular" patch because it seems susceptible
> > > > to this problem:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> > >
> > > That should not be necessary.
> > >
> > > As you note above, interrupt handling is implemented in dwc core so if
> > > there are any issue here at all, which I doubt, then all of the dwc
> > > drivers that currently can be built as modules would all be broken and
> > > this would need to be fixed in core.
> >
> > I don't know yet whether there's an issue. We need a clear argument
> > for why there is or is not. The fact that others might be broken is
> > not an argument for breaking another one ;)
>
> It's not breaking anything that is currently working, and if there's
> some corner case during module unload, that's not the end of the world
> either.

It may not be the end of the world for you, but you have absolutely no
idea of what dangling pointers to kernel memory will do on a user
machine, nor how this can be further exploited. Unloading a module
should never result in an unsafe kernel.

> It's a feature useful for developers and no one expects remove code to
> be perfect (e.g. resilient against someone trying to break it by doing
> things in parallel, etc.).

If that's a feature for you while you are developing, then please keep
this change as part of your own hacking toolbox. IMO the upstream
kernel shouldn't be subjected to this.

>
> > > I've been using the modular pcie-qcom patch for months now, unloading
> > > and reloading the driver repeatedly to test power sequencing, without
> > > noticing any problems whatsoever.
> >
> > Pali's commit log suggests that unloading the module is not, by
> > itself, enough to trigger the problem:
> >
> > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@xxxxxxxxxx/
> >
> > Can you test the scenario he mentions?
>
> Turns out the pcie-qcom driver does not support legacy interrupts so
> there's no risk of there being any lingering mappings if I understand
> things correctly.

It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
the driver while devices are up and running with MSIs allocated,
things may get ugly if things align the wrong way (if a driver still
has a reference to an irq_desc or irq_data, for example).

M.

--
Without deviation from the norm, progress is not possible.