[PATCH] checkpatch: handle utf8 while computing length of commit msg lines

From: Antonio Borneo
Date: Fri Oct 21 2022 - 15:16:57 EST


The current check for the length of each line in the commit msg
uses length($line) that counts line's bytes.
If the line contains utf8 characters, the byte count can exceed
the cap even on quite short lines.

Count the utf8 characters for checking line length.

Signed-off-by: Antonio Borneo <antonio.borneo@xxxxxxxxxxx>

---

Actually it's not fully clear to me if utf8 characters in the
commit msg are acceptable/tolerated or to be avoided.
In the commit msg of 15662b3e8644 ("checkpatch: add a --strict
check for utf-8 in commit logs") is stated:
Some find using utf-8 in commit logs inappropriate.


scripts/checkpatch.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 1e5e66ae5a52..eaad5da50554 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3220,7 +3220,7 @@ sub process {

# Check for line lengths > 75 in commit log, warn once
if ($in_commit_log && !$commit_log_long_line &&
- length($line) > 75 &&
+ length(decode("utf8", $line)) > 75 &&
!($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ ||
# file delta changes
$line =~ /^\s*(?:[\w\.\-\+]*\/)++[\w\.\-\+]+:/ ||

base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
--
2.38.0