[PATCH] checkpatch: use utf-8 match for spell checking

From: Antonio Borneo
Date: Tue Dec 12 2023 - 04:43:27 EST


The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]

Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
./script/checkpatch.pl --git 81c2f059ab9
WARNING: 'ment' may be misspelled - perhaps 'meant'?
#36: FILE: MAINTAINERS:14360:
+M: Clément Léger <clement.leger@xxxxxxxxxxx>
^^^^

Use a utf-8 version of $rawline for spell checking.

Signed-off-by: Antonio Borneo <antonio.borneo@xxxxxxxxxxx>
Reported-by: Clément Le Goffic <clement.legoffic@xxxxxxxxxxx>
---
scripts/checkpatch.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda112..58646bd6ef56 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3477,7 +3477,8 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
+ my $rawline_utf8 = decode("utf8", $rawline);
+ while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
my $typo = $1;
my $blank = copy_spacing($rawline);
my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);

base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.42.0