Re: [PATCH V9 2/5] PCI: Create device tree node for selected devices

From: Bjorn Helgaas
Date: Wed Jun 21 2023 - 16:22:44 EST


In subject, IIUC this patch does not actually create device tree nodes
for selected devices. It looks like it:

- Adds an of_pci_make_dev_node() *interface* that can be used to
create this node

- Creates such a node for *every* bridge

- Does nothing at all for "selected devices" or the Xilinx Alveo

On Wed, Jun 21, 2023 at 10:34:06AM -0700, Lizhi Hou wrote:
> The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> spaces from multiple hardware peripherals to its PCI BAR. Normally,
> the PCI core discovers devices and BARs using the PCI enumeration process.
> There is no infrastructure to discover the hardware peripherals that are
> present in a PCI device, and which can be accessed through the PCI BARs.
>
> For Alveo PCI card, the card firmware provides a flattened device tree to
> describe the hardware peripherals on its BARs. The Alveo card driver can
> load this flattened device tree and leverage device tree framework to
> generate platform devices for the hardware peripherals eventually.

The Alveo details are relevant to the quirk patch but not to *this*
patch.

But the reason for creating a node for every bridge device *is*
relevant and should be included here, since that change affects
everybody that uses OF.

> Apparently, the device tree framework requires a device tree node for the
> PCI device. Thus, it can generate the device tree nodes for hardware
> peripherals underneath. Because PCI is self discoverable bus, there might
> not be a device tree node created for PCI devices. This patch is to add
> support to generate device tree node for PCI devices.

s/This patch is to add/Add/

> Added a kernel option. When the option is turned on, the kernel will
> generate device tree nodes for PCI bridges unconditionally.

s/Added a kernel option/Add a PCI_DYNAMIC_OF_NODES config option/
(Be specific, and way what the patch does, not what you did.)

> Initially, the basic properties are added for the dynamically generated
> device tree nodes.

Make this specific, e.g., list the specific properties added.

> +config PCI_DYNAMIC_OF_NODES
> + bool "Create Devicetree nodes for PCI devices"
> + depends on OF
> + select OF_DYNAMIC
> + help
> + This option enables support for generating device tree nodes for some
> + PCI devices. Thus, the driver of this kind can load and overlay
> + flattened device tree for its downstream devices.
> +
> + Once this option is selected, the device tree nodes will be generated
> + for all PCI bridges.

Is there a convention for using "devicetree" vs "device tree"? The
help message uses both and it would be nice to only use one or the
other.

> @@ -501,8 +501,10 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> * to rely on this function (you ship a firmware that doesn't
> * create device nodes for all PCI devices).
> */
> - if (ppnode)
> + if (ppnode && of_property_present(ppnode, "interrupt-map"))

Maybe this deserves a comment? The connection between "interrupt-map"
and the rest of this patch isn't obvious to me.

Also, it looks like this happens for *everybody*, regardless of
PCI_DYNAMIC_OF_NODES, which seems a little suspect. If it's an
unrelated bug fix it should be a different patch.

> break;
> + else
> + ppnode = NULL;

> +void of_pci_make_dev_node(struct pci_dev *pdev)
> +{
> + struct device_node *ppnode, *np = NULL;
> + const char *pci_type = "dev";
> + struct of_changeset *cset;
> + const char *name;
> + int ret;
> +
> + /*
> + * If there is already a device tree node linked to this device,
> + * return immediately.
> + */
> + if (pci_device_to_OF_node(pdev))
> + return;
> +
> + /* Check if there is device tree node for parent device */
> + if (!pdev->bus->self)
> + ppnode = pdev->bus->dev.of_node;
> + else
> + ppnode = pdev->bus->self->dev.of_node;
> + if (!ppnode)
> + return;
> +
> + if (pci_is_bridge(pdev))
> + pci_type = "pci";

Initialize pci_type = "dev" here instead of way up top:

if (pci_is_bridge(pdev))
pci_type = "pci";
else
pci_type = "dev";

> + name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
> + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));

> +static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
> + struct device_node *np)
> +{
> + struct of_pci_range *rp;
> + struct resource *res;
> + int i = 0, j, ret;
> + u32 flags, num;
> + u64 val64;
> +
> + if (pci_is_bridge(pdev)) {
> + num = PCI_BRIDGE_RESOURCE_NUM;
> + res = &pdev->resource[PCI_BRIDGE_RESOURCES];
> + } else {
> + num = PCI_STD_NUM_BARS;
> + res = &pdev->resource[PCI_STD_RESOURCES];
> + }
> +
> + rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
> + if (!rp)
> + return -ENOMEM;
> +
> + for (j = 0; j < num; j++) {

Initialize i = 0 here so it's connected with the use:

for (i = 0, j = 0; j < num; ...)

> + if (!resource_size(&res[j]))
> + continue;
> +
> + if (of_pci_get_addr_flags(&res[j], &flags))
> + continue;
> +
> + val64 = res[j].start;
> + of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
> + false);
> + if (pci_is_bridge(pdev)) {
> + memcpy(rp[i].child_addr, rp[i].parent_addr,
> + sizeof(rp[i].child_addr));
> + } else {
> + /*
> + * For endpoint device, the lower 64-bits of child
> + * address is always zero.

For the non-OF folks (like me), can you say what the semantics of
parent_addr vs child_addr are? I suppose maybe parent_addr is an
address on the primary side of a bridge and child_addr is the
corresponding address on the secondary side?

And PCI bridges don't perform address translation, so they are
identical?

> + */
> + rp[i].child_addr[0] = j;
> + }

> +int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
> + struct device_node *np)
> +{
> + int ret = 0;
> +
> + if (pci_is_bridge(pdev)) {
> + ret |= of_changeset_add_prop_string(ocs, np, "device_type",
> + "pci");
> + }
> +
> + ret |= of_pci_prop_ranges(pdev, ocs, np);
> + ret |= of_changeset_add_prop_u32(ocs, np, "#address-cells",
> + OF_PCI_ADDRESS_CELLS);
> + ret |= of_changeset_add_prop_u32(ocs, np, "#size-cells",
> + OF_PCI_SIZE_CELLS);
> + ret |= of_pci_prop_reg(pdev, ocs, np);
> + ret |= of_pci_prop_compatible(pdev, ocs, np);
> +
> + /*
> + * The added properties will be released when the
> + * changeset is destroyed.
> + */

I don't think it's meaningful to OR together the "negative error
values" returned by all these functions. Presumably those are things
like -EINVAL, -ENOMEM, etc. ORing them together is admittedly
non-zero, but yields nonsense.

> + return ret;

> +static inline void
> +of_pci_make_dev_node(struct pci_dev *pdev)
> +{
> +}
> +
> +static inline void
> +of_pci_remove_node(struct pci_dev *pdev)
> +{
> +}

Pull these functions all onto one line, like other similar stubs in
this file.

> +#endif /* CONFIG_PCI_DYNAMIC_OF_NODES */

Unnecessary comment since this is all 10 lines.