Re: [PATCH pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled"

From: Vladimir Oltean
Date: Thu Jun 01 2023 - 04:12:17 EST


On Wed, May 31, 2023 at 03:24:46PM -0500, Bjorn Helgaas wrote:
> I guess I should have asked "what bad things happen without this patch
> and without the DT 'disabled' status"?

Well, now that you put it this way, I do realize that things are not so
ideal for me.

Our drivers for the functions of this device were already checking for
of_device_is_available() during probe. So, reverting the core PCIe
patch, they would still not register a network interface, which is good.

However (and this is the bad part), multiple functions of this PCIe
device unfortunately share a common memory, which is not zeroized by
hardware, and so, to avoid multi-bit ECC errors, it must be zeroized by
software, using some memory space accesses from all functions that have
access to that shared memory (every function zeroizes its piece of it).
This, sadly, includes functions which have status = "disabled". See
commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused
ports too").

What we used to do was start probing a bit in enetc_pf_probe(), enable
the memory space, zeroize our part of the shared memory, then check
of_device_is_available() and finally, we disable the memory space again
and exit probing with -ENODEV.

That is not possible anymore with the core patch, because the PCIe core
will not probe our disabled functions at all anymore.

The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
to essentially describe on-chip memory spaces, which are always present.
So presumably, a different system-level solution to initialize those
shared memories (U-Boot?) may be chosen, if implementing this workaround
in Linux puts too much pressure on the PCIe core and the way in which it
does things. Initially I didn't want to do this in prior boot stages
because we only enable the RCEC in Linux, nothing is broken other than
the spurious AER messages, and, you know.. the kernel may still run
indefinitely on top of bootloaders which don't have the workaround applied.
So working around it in Linux avoids one dependency.