Re: [PATCH] of: base: Skip CPU nodes with non-"okay"/"disabled" status

From: Frank Rowand
Date: Tue Nov 16 2021 - 10:11:31 EST


Hi Mathias,

On 11/16/21 3:32 AM, Matthias Schiffer wrote:
> On Mon, 2021-11-15 at 12:23 -0500, Frank Rowand wrote:
>> Hi Matthias,
>>
>> On 11/15/21 3:13 AM, Matthias Schiffer wrote:
>>> On Sun, 2021-11-14 at 14:41 -0500, Frank Rowand wrote:
>>>> On 11/8/21 3:48 AM, Matthias Schiffer wrote:

It turns out I confused myself a bit...

The first email provided the clue that I needed:

>>>>> Allow fully disabling CPU nodes using status = "fail". Having no status
>>>>> property at all is still interpreted as "okay" as usual.

I managed to forget that sentence before I looked at the code in the patch.

>>>>>
>>>>> This allows a bootloader to change the number of available CPUs (for
>>>>> example when a common DTS is used for SoC variants with different numbers
>>>>> of cores) without deleting the nodes altogether, which could require
>>>>> additional fixups to avoid dangling phandle references.
>>>>>
>>>>> References:
>>>>> - https://www.lkml.org/lkml/2020/8/26/1237
>>>>> - https://www.spinics.net/lists/devicetree-spec/msg01007.html
>>>>> - https://github.com/devicetree-org/dt-schema/pull/61
>>>>>
>>>>> Signed-off-by: Matthias Schiffer <matthias.schiffer@xxxxxxxxxxxxxxx>
>>>>> ---
>>>>> drivers/of/base.c | 29 +++++++++++++++++++++++++++++
>>>>> 1 file changed, 29 insertions(+)
>>>>>
>>>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>>>> index 61de453b885c..4e9973627c8d 100644
>>>>> --- a/drivers/of/base.c
>>>>> +++ b/drivers/of/base.c
>>>>> @@ -650,6 +650,32 @@ bool of_device_is_available(const struct device_node *device)
>>>>> }
>>>>> EXPORT_SYMBOL(of_device_is_available);
>>>>>
>>>>> +/**
>>>>> + * __of_device_is_disabled - check if a device has status "disabled"
>>>>> + *
>>>>> + * @device: Node to check status for, with locks already held
>>>>> + *
>>>>> + * Return: True if the status property is set to "disabled",
>>>>> + * false otherwise
>>>>> + *
>>>>> + * Most callers should use __of_device_is_available() instead, this function
>>>>> + * only exists due to the special interpretation of the "disabled" status for
>>>>> + * CPU nodes.
>>>>> + */
>>>>> +static bool __of_device_is_disabled(const struct device_node *device)
>>>>> +{
>>>>> + const char *status;
>>>>> +
>>>>> + if (!device)
>>>>> + return false;
>>>>> +
>>>>> + status = __of_get_property(device, "status", NULL);
>>>>> + if (status == NULL)
>>>>> + return false;
>>>>> +
>>>>> + return !strcmp(status, "disabled");
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * of_device_is_big_endian - check if a device has BE registers
>>>>> *
>>>>> @@ -817,6 +843,9 @@ struct device_node *of_get_next_cpu_node(struct device_node *prev)
>>>>> of_node_put(node);
>>>>> }

So in the following test:

>>>>> for (; next; next = next->sibling) {
>>>>> + if (!__of_device_is_available(next) &&
>>>>> + !__of_device_is_disabled(next))

(!__of_device_is_available(next) &&
!__of_device_is_disabled(next))

is meant to detect a status of "fail" or "fail-sss". Since I forget the sentence
about "fail" from early in the email, I had difficulty in interpreting the intent
of the (!ok && !disabled) form of the test. The intent of the test would be more
clear if it was checking _for_ "fail" instead of checking for _not_ the other
possible status values.

So you _are_ checking for the status of "fail" check (Rob's choice #1 below)
and I just did not understand that was the intent of the patch.

So I am fine with the patch if you change the above logic to check for
"fail" or "fail-sss".

It would also be good to add a comment to the of_get_next_cpu_node() header
comment that "fail" nodes are excluded.

Sorry for the confusion.

-Frank

>>>>
>>>> Shouldn't that just be a check to continue if the device is disabled?
>>>>
>>>> If adding a check for available, then all of the callers of for_each_of_cpu_node()
>>>> need to be checked. There is at least one that is suspicious - arch/arm/mach-imx/platsmp.c
>>>> has a comment:
>>>>
>>>> * Initialise the CPU possible map early - this describes the CPUs
>>>> * which may be present or become present in the system.
>>
>> Thanks for the links to previous discussion you provided below. I had
>> forgotten the previous discussion.
>>
>> In [2] Rob ended up saying that there were two options he was fine with.
>> Either (or both), in of_get_next_cpu_node(),
>>
>> (1) use status of "fail" as the check or
>>
>> (2) use status of "disabled" as the check, conditional on !IS_ENABLED(CONFIG_PPC)
>> "this would need some spec update"
>> "Need to double check MIPS and Sparc though."
>>
>> Neither of these two options are what this patch does. It seems to me that
>> option 1 is probably the easiest and least intrusive method.
>
> My intuition is that a device with an unknown status value should not
> be used. For most devices this is already handled by treating any value
> that is not unset, "okay" or "ok" the same. For CPU nodes, this would
> be the case by treating such values like "fail".
>
> I did a quick grep through the in-tree Device Trees, and I did find a
> few unusual status properties (none of them in CPU nodes though):
>
> - Typo "failed" (4 cases)
> - Typo "disable" (17 cases)
> - "reserved" (19 cases)
>
> "fail" appears 2 times, the rest is "okay", "ok" or "disabled".
>
> I do not have a strong opinion on this though - for our concrete
> usecase, checking for "fail" is fine, and we can treat unknown values
> like "disabled" if you prefer that solution. Should "fail-*" status
> values also be treated like "fail" then?
>
>
>
>
>>
>> -Frank
>>
>>> Previously, there were two option for the (effective) value of the
>>> status of a device_node:
>>>
>>> - "okay", "ok" or unset
>>> - anything else (which includes "disabled" and "fail")
>>>
>>> __of_device_is_available() checks which of these two is the case.
>>>
>>> With the new code, we have 3 cases for the status of CPU nodes:
>>>
>>> - "okay", "ok" or unset
>>> - "disabled"
>>> - anything else ("fail", ...)
>>>
>>> My patch will only change the behaviour in one case: When a CPU node
>>> has a status that is not "okay", "ok", "disabled" or unset - which is
>>> exactly the point of my patch.
>>>
>>> See also the change [1], which removed the !available check a while
>>> ago, and the discussion in [2], which led us to the conclusion that
>>> of_get_next_cpu_node() must not distinguish "okay" and "disabled" CPU
>>> nodes and we instead need a third status to disable a CPU for real.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/of/base.c?id=c961cb3be9064d1097ccc019390f8b5739daafc6
>>> [2] https://www.lkml.org/lkml/2020/8/26/1237
>>>
>>>
>>>>
>>>> -Frank
>>>>
>>>>> + continue;
>>>>> if (!(of_node_name_eq(next, "cpu") ||
>>>>> __of_node_is_type(next, "cpu")))
>>>>> continue;
>>>>>
>>>>
>>>>
>>
>>
>