Re: A Great Idea (tm) about reimplementing NLS.

From: Bernd Eckenfels
Date: Sat Jun 18 2005 - 14:12:50 EST


In article <200506181804.21366.robin.rosenberg.lists@xxxxxxxxxx> you wrote:
> Every unicode character has exactly one UTF-8 representation.

Every unicode code point has exactly one UTF-8 representation, however there
are for a few glyphs multiple code points. And this is not only a problem
beause of homoglphys which look like/similiar, but also because of combining
characters vs. legacy characters. However thats more an issue of the user
interface (think IDN exploits).

Personally I think the on-disk filesystem format should be required to be
UTF-8, and its an open discussion if the syscalls accept UTF-8 or locale
byte encodings. Currently its a mess. We can learn from Windows here:)

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/