Re: [RFC PATCH] of: device: Support 2nd sources of probeable but undiscoverable devices

From: Doug Anderson
Date: Wed Sep 27 2023 - 19:50:59 EST


Hi,

On Tue, Sep 26, 2023 at 7:37 PM Jeff LaBundy <jeff@xxxxxxxxxxx> wrote:
>
> Hi Doug,
>
> On Fri, Sep 22, 2023 at 05:11:10PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Sep 22, 2023 at 12:08 PM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
> > >
> > > > > This seems like overkill to me. Do we really need groups and a mutex
> > > > > for each group? Worst case is what? 2-3 groups of 2-3 devices?
> > > > > Instead, what about extending "status" with another value
> > > > > ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> > > > > kernel would just ignore nodes with that status. Then we can process
> > > > > those nodes separately 1-by-1.
> > > >
> > > > My worry here is that this has the potential to impact boot speed in a
> > > > non-trivial way. While trackpads and touchscreens _are_ probable,
> > > > their probe routines are often quite slow. This is even mentioned in
> > > > Dmitry's initial patches adding async probe to the kernel. See commit
> > > > 765230b5f084 ("driver-core: add asynchronous probing support for
> > > > drivers") where he specifically brings up input devices as examples.
>
> Ideally, all but one driver in a group should bail out of probe quickly if
> the device is not populated. If not, I would consider that to be a bug or at
> least room for improvement in that driver.
>
> The reason input devices can take a while to probe is because they may be
> loading FW over I2C or performing some sort of calibration procedure; only
> one driver in the group should get that far.

Hmm, that's not my experience. Specifically I've seen i2c-hid devices
whose datasheets say that you're not allowed to talk i2c to them at
all for hundreds of milliseconds after you power them on. See, for
instance, "i2c-hid-of-goodix.c" which has a "post_gpio_reset_delay_ms"
of 180 ms and "i2c-hid-of-elan.c" which has one of 300 ms.

As I understand it these touchscreens have firmware on them and that
firmware can take a while to boot. Until the firmware boots they won't
respond over i2c. This is simply not something that Linux can do
anything about.

About the best you could do would be to add a board-specific driver
that understood that it could power up the rails, wait the maximum
amount of time that all possible touchscreens might need, and then
look for i2c ACKs. I'm still hoping to hear from Rob about how I would
get a board-specific driver to load on a DT system so I can
investigate / prototype this.


> > > We could add information on the class of device. touchscreen and
> > > touchpad aliases or something.
> >
> > Ah, I see. So something like "fail-needs-probe-<class>". The
> > touchscreens could have "fail-needs-probe-touchscreen" and the
> > trackpads could have "fail-needs-probe-trackpad" ? That could work. In
> > theory that could fall back to the same solution of grabbing a mutex
> > based on the group ID...
> >
> > Also: if having the mutex in the "struct device" is seen as a bad
> > idea, it would also be easy to remove. __driver_probe_device() could
> > just make a call like "of_device_probe_start()" at the beginning that
> > locks the mutex and then "of_device_probe_end()" that unlocks it. Both
> > of those calls could easily lookup the mutex in a list, which would
> > get rid of the need to store it in the "struct device".
> >
> >
> > > > That would lead me to suggest this:
> > > >
> > > > &i2c_bus {
> > > > trackpad-prober {
> > > > compatible = "mt8173-elm-hana-trackpad-prober";
> > > >
> > > > tp1: trackpad@10 {
> > > > compatible = "hid-over-i2c";
> > > > reg = <0x10>;
> > > > ...
> > > > post-power-on-delay-ms = <200>;
> > > > };
> > > > tp2: trackpad@20 {
> > > > compatible = "hid-over-i2c";
> > > > reg = <0x20>;
> > > > ...
> > > > post-power-on-delay-ms = <200>;
> > > > };
> > > > };
> > > > };
> > > >
> > > > ...but I suspect that would be insta-NAKed because it's creating a
> > > > completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
> > > > device tree. I don't know if there's something that's functionally
> > > > similar that would be OK?
>
> This solution seems a bit confusing to me, and would require more edits
> to the dts each time a second source is added. It also means one would
> have to write a small platform driver for each group of devices, correct?

No matter what we need to add something to the dts each time a second
source is added, right?

While it's true that we'd end up with some extra drivers, if we do it
correctly we don't necessarily need a driver for each group of devices
nor even a driver per board. If several boards have very similar
probing requirements then, even if they have unique "compatible"
strings they could still end up using the same Linux driver.

I've actually been talking offline with folks on ChromeOS more about
this problem as well. Chen-Yu actually pointed at a patch series (that
never landed, I guess) that has some similar ideas [1]. I guess in
that case Hans was actually constructing device tree properties
manually in the driver. I was thinking more of having all of the
options listed in the device tree and then doing something that only
causes some of them to probe.

If Rob was OK with it, I guess I could have some sort of top-level
"hwmanager" node like Hans did and then have phandle links to all the
hardware that are managed by it. Then I could just change those to
"okay"?

Ideally, though, this could somehow use device tree "overlays" I
guess. That seems like almost a perfect fit. I guess the issue here,
though, is that I'd want the overlays bundled together with the
original DT and then the board-specific "hardware prober" driver to
actually apply the overlays after probing. Does that seem sensible?


[1] https://lore.kernel.org/linux-arm-kernel/20160901190820.21987-1-hdegoede@xxxxxxxxxx/