Re: [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8

From: Thomas Gleixner
Date: Wed Jul 07 2021 - 05:00:37 EST


Nishanth,
On Fri, Jul 02 2021 at 20:21, Nishanth Menon wrote:
> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text

Sigh. Why are people adding such things w/o running this script in the
first place.

> While python will barf at it with:
>
> FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
> Traceback (most recent call last):
> File "scripts/spdxcheck.py", line 244, in <module>
> spdx = read_spdxdata(repo)
> File "scripts/spdxcheck.py", line 47, in read_spdxdata
> for l in open(el.path).readlines():
> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
>
> While it is indeed debatable if 'Licensor.' used in the license file
> needs unicode quotes, instead, let us force spdxcheck to read utf-8
> instead.

s/let us//

Ditto for the $subject. See Documentation/process/ for further enlightment.

> Reported-by: Rahul T R <r-ravikumar@xxxxxx>
> Signed-off-by: Nishanth Menon <nm@xxxxxx>

With that fixed:

Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>