Re: [PATCH 0/3] Fix dt-validate issues on qemu dtbdumps due to dt-bindings

From: Conor.Dooley
Date: Tue Aug 09 2022 - 15:03:47 EST


On 09/08/2022 15:14, Rob Herring wrote:
> On Mon, Aug 08, 2022 at 10:01:11PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>> On 08/08/2022 22:34, Jessica Clarke wrote:
>>> On Fri, Aug 05, 2022 at 05:28:42PM +0100, Conor Dooley wrote:
>>>> From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>>>> The final patch adds some new ISA strings
>>>> which needs scruitiny from someone with more knowledge about what ISA
>>>> extension strings should be reported in a dt than I have.
>>>
>>> Listing every possible ISA string supported by the Linux kernel really
>>> is not going to scale...
>
> How does the kernel scale? (No need to answer)
>
>> Yeah, totally correct there. Case for adding a regex I suppose, but I
>> am not sure how to go about handling the multi-letter extensions or
>> if parsing them is required from a binding compliance point of view.
>> Hoping for some input from Palmer really.
>
> Yeah, looks like a regex pattern is needed.

I started pottering away at this but I have arrived at:
rv64imaf?d?c?h?(_z[imafdqcbvkh]([a-z])*)*$

I suspect that before "h?" there should be more single letter
extensions added for completeness sake. So then it'd bloat out to:
rv64imaf?d?q?c?b?v?k?h?(_z[imafdqcbvkh]([a-z])*)*$

I checked a couple different "bad" isa strings against it and
nothing went up in flames but my regex skills are far from great
so I'm sure there's better ways to represent this.

Anyways, this pattern is based on my understanding that:
- the single letter order is fixed & we don't care about things that
can't even do "ima"
- the multi letter extensions are all in a "_z<foo>" format where the
first letter of <foo> is a valid single letter extension
- we don't care about the e extension from an OS PoV (this could be a
very flawed take...)
- after the first two chars, the extension name could be an english
word (ifencei anyone?) so it's not worth restricting the charset
- that attempting to validate the contents of the multiletter extensions
with dt-validate beyond the formatting is a futile, massively verbose
or unwieldy exercise at best

Some or all of those assumptions could be very very wrong so if {someone,
anyone} wants to correct me - feel ***more*** than free..

Thanks,
Conor.

patch would then look like:

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
index d632ac76532e..1e54e7746190 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -74,9 +74,7 @@ properties:
insensitive, letters in the riscv,isa string must be all
lowercase to simplify parsing.
$ref: "/schemas/types.yaml#/definitions/string"
- enum:
- - rv64imac
- - rv64imafdc
+ pattern: rv64imaf?d?q?c?b?v?k?h?(_z[imafdqcbvkh]([a-z])*)*$

# RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
timebase-frequency: false