Re: unicode

Alex Belits (
Fri, 15 May 1998 00:55:43 -0700 (PDT)

On 14 May 1998, Jan Vroonhof wrote:

> > It's assumed that files and filenames shouldn't be re-encoded
> > automatically.
> If you are not reencoding then why take the trouble of passing charset
> labels around?

Because re-encoding is the last and worst thing that one may want to
happen with them -- charsets/language labels are necessary for displaying
characters with fonts that are mapped to charsets and applying rules that
are mapped to languages (capitalization, hyphenation, phonetic match). The
initial assumption is that adding reasonable support for fonts and rules
is possible without exposing any other encoding or charset to application.
Then no one re-encodes anything except when handles charset-specific
devices or charset-specific filesystems.

> Secondly it is very very difficult to make sure things
> do not get reencoded mostly because you cannot be sure what is a
> filename and what is not.

See above. The whole idea of re-encoding is completely foreign to this
model, it can be used only if something at the low level can't handle
arbitrary charset/font. Since in most of cases the filesystem is ext2 (or
NFS and something over it) and multilingual display is X server,
re-encoding in the applcation will be unnecessary because ext2 handles
bytes transparently except '/' and NUL, and X handles charsets in their
original form.


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to