Re: [PATCH v3] driver core: Fix device link device name collision

From: Michael Walle
Date: Sat Jan 09 2021 - 11:50:26 EST


Am 2021-01-08 18:22, schrieb Saravana Kannan:
On Fri, Jan 8, 2021 at 12:16 AM Michael Walle <michael@xxxxxxxx> wrote:

Am 2021-01-08 02:24, schrieb Saravana Kannan:
> The device link device's name was of the form:
> <supplier-dev-name>--<consumer-dev-name>
>
> This can cause name collision as reported here [1] as device names are
> not globally unique. Since device names have to be unique within the
> bus/class, add the bus/class name as a prefix to the device names used
> to
> construct the device link device name.
>
> So the devuce link device's name will be of the form:
> <supplier-bus-name>:<supplier-dev-name>--<consumer-bus-name>:<consumer-dev-name>
>
> [1] -
> https://lore.kernel.org/lkml/20201229033440.32142-1-michael@xxxxxxxx/
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 287905e68dd2 ("driver core: Expose device link details in
> sysfs")
> Reported-by: Michael Walle <michael@xxxxxxxx>
> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> ---
[..]

The changes are missing for the error path and
devlink_remove_symlinks(),
right?

Removing symlinks doesn't need the name. Just needs the "handle". So
we are good there.

I don't get it. What is the "handle"? Without the patch below
kernfs_remove_by_name() in sysfs_remove_link will return -ENOENT. With
the patch it will return 0.

And even if it would work, how is this even logical:

snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
ret = sysfs_create_link(&sup->kobj, &link->link_dev.kobj, buf);
if (ret)
goto err_con_dev;
snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
ret = sysfs_create_link(&con->kobj, &link->link_dev.kobj, buf);
if (ret)
goto err_sup_dev;
[..]
err_sup_dev:
snprintf(buf, len, "consumer:%s", dev_name(con));
sysfs_remove_link(&sup->kobj, buf);

You call sysfs_create_link("consumer:bus_name:dev_name") but the
corresponding rollback is sysfs_remove_link("consumer:dev_name"), that
is super confusing.

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4140a69dfe18..385e16d92874 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -485,7 +485,7 @@ static int devlink_add_symlinks(struct device *dev,
goto out;

err_sup_dev:
- snprintf(buf, len, "consumer:%s", dev_name(con));
+ snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con),
dev_name(con));
sysfs_remove_link(&sup->kobj, buf);
err_con_dev:
sysfs_remove_link(&link->link_dev.kobj, "consumer");
@@ -508,7 +508,9 @@ static void devlink_remove_symlinks(struct device
*dev,
sysfs_remove_link(&link->link_dev.kobj, "consumer");
sysfs_remove_link(&link->link_dev.kobj, "supplier");

- len = max(strlen(dev_name(sup)), strlen(dev_name(con)));
+ len = max(strlen(dev_bus_name(sup)) + strlen(dev_name(sup)),
+ strlen(dev_bus_name(con)) + strlen(dev_name(con)));
+ len += strlen(":");
len += strlen("supplier:") + 1;
buf = kzalloc(len, GFP_KERNEL);
if (!buf) {
@@ -516,9 +518,9 @@ static void devlink_remove_symlinks(struct device
*dev,
return;
}

- snprintf(buf, len, "supplier:%s", dev_name(sup));
+ snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup),
dev_name(sup));
sysfs_remove_link(&con->kobj, buf);
- snprintf(buf, len, "consumer:%s", dev_name(con));
+ snprintf(buf, len, "consumer:%s:%s", dev_bus_name(sup),
dev_name(con));

btw this should be dev_bus_name(con).

sysfs_remove_link(&sup->kobj, buf);
kfree(buf);
}

With these changes:

Tested-by: Michael Walle <michael@xxxxxxxx>

Greg,

I think it's good to pick up this version if you don't see any issues.

Why so fast?


This at least make the warning go away.

Phew!

BUT, there is somesthing strange or at least I don't get it:

# find /sys/bus/pci/devices/0000:00:00.0/ -name "consumer\:*"
# find /sys/bus/pci/devices/0000:00:00.1/ -name "consumer\:*"
/sys/bus/pci/devices/0000:00:00.1/consumer:mdio_bus:0000:00:00.1:04
/sys/bus/pci/devices/0000:00:00.1/consumer:mdio_bus:0000:00:00.1

enetc0 (0000:00:00.0) has no consumers while enetc1 (0000:00:00.1)
has ones. Although both have PHYs connected. Here are the
corresonding device tree entries:

enetc0:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts?h=v5.11-rc2#n81

enetc1:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts?h=v5.11-rc2#n21

Why is there a link between enetc1 and its PHY (and MDIO bus)
but not for enetc0?

So a lot of subtle things could be going on here that make it look
like it's not working correctly but it's actually working fine. Since
fw_devlink=permissive is the default mode, all links that are created
are SYNC_STATE_ONLY links. These links are deleted after their
consumers probe. So if you really want to see all the "real" links
persist, try booting with fw_devlink=on. You might have boot issues
though -- I'm working on that separately [1]. Also, SYNC_STATE_ONLY
links can be created between the parent of a consumer and the supplier
(I won't get into the why here) depending on some ordering -- so that
might be causing some spurious looking links, but they aren't.

Another way to do what you are trying to do is to enable the logs in
device_link_add() and look at them to see if all the links are created
as you'd expect.

btw, here are all links:

# ls /sys/class/devlink/
pci:0000:00:00.1--mdio_bus:0000:00:00.1
platform:5000000.iommu--pci:0000:00:00.0
platform:5000000.iommu--pci:0000:00:00.1
platform:5000000.iommu--pci:0000:00:00.2
platform:5000000.iommu--pci:0000:00:00.3
platform:5000000.iommu--pci:0000:00:00.5
platform:5000000.iommu--pci:0000:00:00.6
platform:5000000.iommu--pci:0001:00:00.0
platform:5000000.iommu--pci:0002:00:00.0
platform:5000000.iommu--platform:2140000.mmc
platform:5000000.iommu--platform:2150000.mmc
platform:5000000.iommu--platform:22c0000.dma-controller
platform:5000000.iommu--platform:3100000.usb
platform:5000000.iommu--platform:3110000.usb
platform:5000000.iommu--platform:3200000.sata
platform:5000000.iommu--platform:8000000.crypto
platform:5000000.iommu--platform:8380000.dma-controller
platform:5000000.iommu--platform:f080000.display
platform:f1f0000.clock-controller--platform:f080000.display
regulator:regulator.0--i2c:0-0050
regulator:regulator.0--i2c:1-0057
regulator:regulator.0--i2c:2-0050
regulator:regulator.0--platform:3200000.sata

As you can see, most of the links that fw_devlink created are gone.
Because all the consumers probed. Any remaining ones you see here are
non-SYNC_STATE_ONLY links created by the driver/frameworks or cases
where consumers haven't probed. My guess is that only the first one is
of this criteria and it doesn't hurt anything here. Try booting with
fw_devlink=on and check this list. That'll give you a better idea.

Thanks for explaining. I'll try that.

-michael