Re: [PATCH] of: base: Skip CPU nodes with non-"okay"/"disabled" status

From: Matthias Schiffer
Date: Tue Nov 16 2021 - 03:32:21 EST


On Mon, 2021-11-15 at 12:23 -0500, Frank Rowand wrote:
> Hi Matthias,
>
> On 11/15/21 3:13 AM, Matthias Schiffer wrote:
> > On Sun, 2021-11-14 at 14:41 -0500, Frank Rowand wrote:
> > > On 11/8/21 3:48 AM, Matthias Schiffer wrote:
> > > > Allow fully disabling CPU nodes using status = "fail". Having no status
> > > > property at all is still interpreted as "okay" as usual.
> > > >
> > > > This allows a bootloader to change the number of available CPUs (for
> > > > example when a common DTS is used for SoC variants with different numbers
> > > > of cores) without deleting the nodes altogether, which could require
> > > > additional fixups to avoid dangling phandle references.
> > > >
> > > > References:
> > > > - https://www.lkml.org/lkml/2020/8/26/1237
> > > > - https://www.spinics.net/lists/devicetree-spec/msg01007.html
> > > > - https://github.com/devicetree-org/dt-schema/pull/61
> > > >
> > > > Signed-off-by: Matthias Schiffer <matthias.schiffer@xxxxxxxxxxxxxxx>
> > > > ---
> > > > drivers/of/base.c | 29 +++++++++++++++++++++++++++++
> > > > 1 file changed, 29 insertions(+)
> > > >
> > > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > > index 61de453b885c..4e9973627c8d 100644
> > > > --- a/drivers/of/base.c
> > > > +++ b/drivers/of/base.c
> > > > @@ -650,6 +650,32 @@ bool of_device_is_available(const struct device_node *device)
> > > > }
> > > > EXPORT_SYMBOL(of_device_is_available);
> > > >
> > > > +/**
> > > > + * __of_device_is_disabled - check if a device has status "disabled"
> > > > + *
> > > > + * @device: Node to check status for, with locks already held
> > > > + *
> > > > + * Return: True if the status property is set to "disabled",
> > > > + * false otherwise
> > > > + *
> > > > + * Most callers should use __of_device_is_available() instead, this function
> > > > + * only exists due to the special interpretation of the "disabled" status for
> > > > + * CPU nodes.
> > > > + */
> > > > +static bool __of_device_is_disabled(const struct device_node *device)
> > > > +{
> > > > + const char *status;
> > > > +
> > > > + if (!device)
> > > > + return false;
> > > > +
> > > > + status = __of_get_property(device, "status", NULL);
> > > > + if (status == NULL)
> > > > + return false;
> > > > +
> > > > + return !strcmp(status, "disabled");
> > > > +}
> > > > +
> > > > /**
> > > > * of_device_is_big_endian - check if a device has BE registers
> > > > *
> > > > @@ -817,6 +843,9 @@ struct device_node *of_get_next_cpu_node(struct device_node *prev)
> > > > of_node_put(node);
> > > > }
> > > > for (; next; next = next->sibling) {
> > > > + if (!__of_device_is_available(next) &&
> > > > + !__of_device_is_disabled(next))
> > >
> > > Shouldn't that just be a check to continue if the device is disabled?
> > >
> > > If adding a check for available, then all of the callers of for_each_of_cpu_node()
> > > need to be checked. There is at least one that is suspicious - arch/arm/mach-imx/platsmp.c
> > > has a comment:
> > >
> > > * Initialise the CPU possible map early - this describes the CPUs
> > > * which may be present or become present in the system.
>
> Thanks for the links to previous discussion you provided below. I had
> forgotten the previous discussion.
>
> In [2] Rob ended up saying that there were two options he was fine with.
> Either (or both), in of_get_next_cpu_node(),
>
> (1) use status of "fail" as the check or
>
> (2) use status of "disabled" as the check, conditional on !IS_ENABLED(CONFIG_PPC)
> "this would need some spec update"
> "Need to double check MIPS and Sparc though."
>
> Neither of these two options are what this patch does. It seems to me that
> option 1 is probably the easiest and least intrusive method.

My intuition is that a device with an unknown status value should not
be used. For most devices this is already handled by treating any value
that is not unset, "okay" or "ok" the same. For CPU nodes, this would
be the case by treating such values like "fail".

I did a quick grep through the in-tree Device Trees, and I did find a
few unusual status properties (none of them in CPU nodes though):

- Typo "failed" (4 cases)
- Typo "disable" (17 cases)
- "reserved" (19 cases)

"fail" appears 2 times, the rest is "okay", "ok" or "disabled".

I do not have a strong opinion on this though - for our concrete
usecase, checking for "fail" is fine, and we can treat unknown values
like "disabled" if you prefer that solution. Should "fail-*" status
values also be treated like "fail" then?




>
> -Frank
>
> > Previously, there were two option for the (effective) value of the
> > status of a device_node:
> >
> > - "okay", "ok" or unset
> > - anything else (which includes "disabled" and "fail")
> >
> > __of_device_is_available() checks which of these two is the case.
> >
> > With the new code, we have 3 cases for the status of CPU nodes:
> >
> > - "okay", "ok" or unset
> > - "disabled"
> > - anything else ("fail", ...)
> >
> > My patch will only change the behaviour in one case: When a CPU node
> > has a status that is not "okay", "ok", "disabled" or unset - which is
> > exactly the point of my patch.
> >
> > See also the change [1], which removed the !available check a while
> > ago, and the discussion in [2], which led us to the conclusion that
> > of_get_next_cpu_node() must not distinguish "okay" and "disabled" CPU
> > nodes and we instead need a third status to disable a CPU for real.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/of/base.c?id=c961cb3be9064d1097ccc019390f8b5739daafc6
> > [2] https://www.lkml.org/lkml/2020/8/26/1237
> >
> >
> > >
> > > -Frank
> > >
> > > > + continue;
> > > > if (!(of_node_name_eq(next, "cpu") ||
> > > > __of_node_is_type(next, "cpu")))
> > > > continue;
> > > >
> > >
> > >
>
>