Re: [PATCH v1 1/1] Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()"

From: Ferry Toth
Date: Wed Oct 18 2023 - 03:57:25 EST


Hi,

(resend due to html reject)

On 17-10-2023 23:43, Dmitry Torokhov wrote:
Hi Andy,

On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
On Tue, Oct 17, 2023 at 08:59:05PM +0200, Linus Walleij wrote:
On Tue, Oct 17, 2023 at 8:34 PM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:

In the past some file system developers have told us (Ulf will know)
that we can't rely on the block device enumeration to identify
devices, and requires that we use things such as sysfs or the
UUID volume label in ext4 to identify storage.

While I technically might agree with you, this was working for everybody
since day 1 of support of Intel Merrifield added (circa v4.8), now _user
space_ is broken.

Actually, I don't agree with that, just relaying it. I would prefer that we
solve exactly the problem that we are facing here: some random unrelated
code or similar affecting enumeration order of mmc devices.

Sorry, but the era of static configuration where one has a well defined
order in which things are probed and numbered has long gone. The right
answer is either device aliases that provides stable numbering on a
board that is not dependent on scheduler behavior, mutexes
implementation (how they deal with writer starvation, etc),
kernel/driver/subsystem linking order and myriad other things, or
mounting by UUID. The kernel does not provide any guarantees on the
stability of device probe and instantiation order.

If you think about it it is the same issue as legacy GPIO numbering.
It was convenient some time ago, but now it is no longer suitable or
sufficient and could change when kernel is uprevved.


It's not the first time it happens to me, I have several devices that change
this enumeration order depending on whether an SD card is plugged
in or not, and in a *BIG* way: the boot partition on the soldered eMMC
changes enumeration depending on whether an SD card is inserted
or not, and that has never been fixed (because above).

This is not the problem I have. I haven't added any SD card, hardware
configuration is the same. The solely difference in the whole setup is
this revert applied or not.

Yes, I guess there is a contention on this mutex and the fact that we
are now taking it once and not twice makes difference in which probes
happen. If you look at the logs, you will see that even before the patch
controllers did not enumerate on the order of PCI functions:

[ 36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
[ 36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
[ 36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA

You are referring to the order printed in dmesg. But actually

mmc0 = 0000:00:01.0
mmc1 = 0000:00:01.2
mmc2 = 0000:00:01.3

And this has been so for like 8 years. See f.i. https://github.com/edison-fw/meta-intel-edison/issues/135
(this is with Yocto, so using systemd, the issue discussed there is not related to this but to card detection iirc)

So you have mmc2 instantiated before mmc1 even before the patch. This
happens because we now have

.probe_type = PROBE_PREFER_ASYNCHRONOUS,

in sdhci_driver structure in drivers/mmc/host/sdhci-pci-core.c. It just
happened that even with asynchronous probing your storage did end up on
mmc0 originally and you were happy.

I wonder, could you please post entire dmesg for your system?


That said, device trees are full of stuff like this:

aliases {
serial0 = &uart_AO;
mmc0 = &sd_card_slot;
mmc1 = &sdhc;
};

And Rob, AFAIU, is against aliases.

Rob might not want them, but they are the reality and are present for
multiple classes of devices and I believe are here to stay.


Notice how this enumeration gets defined by the aliases.

Can you do the same with device properties? (If anyone can
answer that question it's Dmitry!)

No, and why should we?

Because device properties are not device tree, they are just some
Linux thing so we can do whatever we want. Just checking if
Dmitry has some idea that would solve this for good, he usually
replies quickly.

OK.

I think the right answer is "fix the userspace" really in this case. We
could also try extend of_alias_get_id() to see if we could pass some
preferred numbering on x86. But this will again be fragile if the
knowledge resides in the driver and is not tied to a particular board
(as it is in DT case): there could be multiple controllers, things will
be shifting board to board...

Thanks.