[PATCH v2 1/3] scripts: add spelling_sanitizer.sh script

From: Zhen Lei
Date: Wed Jun 16 2021 - 08:26:41 EST


The file scripts/spelling.txt recorded a large number of spelling
"mistake||correction" pairs. These entries are currently maintained in
order, but the results are not strict. In addition, when someone wants to
add some new pairs, he either sort them manually or write a script, which
is clearly a waste of labor. So add this script. For all spelling
"mistake||correction" pairs, sort based on "correction", then on "mistake",
and remove duplicates. Sorting based on "mistake" first is not chosen
because it is uncontrollable.

Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
---
scripts/spelling_sanitizer.sh | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
create mode 100755 scripts/spelling_sanitizer.sh

diff --git a/scripts/spelling_sanitizer.sh b/scripts/spelling_sanitizer.sh
new file mode 100755
index 000000000000..603bb7e0e66b
--- /dev/null
+++ b/scripts/spelling_sanitizer.sh
@@ -0,0 +1,27 @@
+#!/bin/sh -efu
+# SPDX-License-Identifier: GPL-2.0
+
+# To get the traditional sort order that uses native byte values
+export LC_ALL=C
+
+cd ${0%/*}
+
+src=spelling.txt
+comments=`sed -n '/#/p' $src`
+
+# Convert the format of 'codespell' to the current
+sed -r -i 's/ ==> /||/' $src
+
+# For all spelling "mistake||correction" pairs(non-comment lines):
+# Sort based on "correction", then "mistake", and remove duplicates
+sed -n '/#/!p' $src | sort -u -t '|' -k 3 -k 1 -o $src
+
+# Backfill comment lines
+ln=0
+echo "$comments" | while read line
+do
+ let ln+=1
+ sed -i "$ln i\\$line" $src
+done
+
+cd - > /dev/null
--
2.25.1