[PATCH 0/4] UTF-8ifying the kernel sources

From: David Eger
Date: Tue Mar 23 2004 - 18:03:07 EST



Following are four patches against Linux 2.6.4 for charset clean-up.
That is, converting the sources to be UTF-8 text. Please consider.

Thankfully, the vast majority of the kernel sources (some 15860 files)
are 7-bit clean ASCII. The rest (274 files) are mostly ISO Latin-1.
However, there are spots of pure junk (userland output in the docs and
some corruption that's creeped in over time), box-drawing characters,
EUC-JP, and ISO-2022-JP.

The only controversial patch here is patch 4, which changes the
output of the kernel to userspace (via /proc, mostly). It sets the
policy that the kernel should output UTF-8 instead of ISO Latin-1
when printing things that are not 7-bit ASCII. Details below.

-dte

Patch 1: linux-2.6.4-utf8-cleanup-auto.diff.bz2
===============================================
237 ISO Latin-1 files auto-converted to UTF-8.
All changes are to documentation or to the comments of code.

Patch 2: linux-2.6.4-utf8-cleanup-jp.diff
=========================================
arch/v850/kernel/as85ep1.ld
arch/v850/kernel/as85ep1.c

Two files containing Japanese comments - in two different charsets!
Unfortunately, emacs doesn't understand CJK pages in UTF-8.
cat, less, and vim do. (though vim gets confused when you look
at the diff, since it is a mix of two charsets...)

Patch 3: linux-2.6.4-utf8-cleanup-wrong.diff
============================================
drivers/video/amifb.c - +- sign (NOTE: X's .ttf files just don't have it)
Documentation/i2c/i2c-protocol - NBSP, but why? (made regular space)
arch/i386/kernel/cpu/cyrix.c - NBSP, but why? (made regular space)
include/linux/802_11.h - why the non-standard dash? (made regular dash)
scripts/docproc.c - why the bizarre spelling for specific? (fixed)
fs/ext2/xattr.c - bad ASCII art (made regular pipe - fixed)
fs/ext3/xattr.c - bad ASCII art (made regular pipe - fixed)
arch/arm/nwfpe/fpopcode.h - line-drawing characters (fixed)
include/asm-m68k/atarihw.h - 0x94? no, it's an Ã, for BjÃrn
include/asm-m68k/atariints.h - 0x94? no, it's an Ã, for BjÃrn

Patch 4: linux-2.6.4-utf8-cleanup-cstrings2utf8.diff
====================================================
arch/ppc/platforms/proc_rtas.c - a C string w/"degrees": exports to proc
arch/ppc64/kernel/rtas-proc.c - a C string w/"degrees": exports to proc
drivers/macintosh/therm_adt7467.c - temperature reporting (degrees sign)
- several printk's, output to a devfs interface, MODULE_PARAM_DESC(),
drivers/mtd/chips/cfi_probe.c - time reporting (micro sign)
- printk's in the DEBUG code
drivers/net/wireless/netwave_cs.c - module version string
(author's name - but it doesn't seem to be *used* for anything...)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/