Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberi=

Kai Henningsen (kai@khms.westfalen.de)
26 Aug 1997 03:08:00 +0200


darin@connectnet.com (Darin Johnson) wrote on 21.08.97 in <199708211803.LAA12633@connectnet1.connectnet.com>:

> > From: erik@arbat.com (Erik Corry)
>
> > Most people have objections to decisions made in Unicode. This
> > is inevitable in a standard of this size, on a subject that
> > raises such emotions.
>
> Yes, this is true. But then, why was such a wide ranging standard
> imposed in the first place? What's the history here anyway? I get

That should be bloody obvious. People - _all_ people - hate dealing with
multiple character sets.

> the impression that the standards body didn't have a broad enough
> membership base. Thus, you have a few people trying to solve problems

All the national standards organizations in the world, not broad enough?
Ha.

> for themselves, and imposing their solution on everyone. A standard
> this size should take maybe 10-20 years to do right.

I'm glad you aren't on that commitee, then.

> > For that matter, ASCII is not in alphabetical order. For that
> > the order would have to be AaBbCcDdEeFfGgHhIi etc.
>
> Well then, I guess you don't mind doing text processing in EBCDIC!
> Sort order is very important.

Sort order is important. But cultural sort order (as opposed to any odd
sort order) _cannot_ be done via naked byte order and picking the right
character set. It's not even possible for English - you want to sort

Andy
boring
John

and no naked byte order will ever give you this.

> It is somewhat moot though. A standard that no one uses isn't a standard.

Well, as Unicode is definitely used in Windows (95, NT - and, thus, by
everyone using those systems), that doesn't seem to be a problem here.
It's used all right. (No, it's not used _only_ by Windows, but that alone
counts for a pretty large market segment.)

> Another ugly part is, you don't know what encoding most FS's actually
> use. That is, if you've got a file name on ext2fs, how do you know
> how to convert it to UTF-8? Or an imported ufs disk? What if ext2fs
> has some files in one encoding, and others in a different one?

That's why you want to standardize those on UTF-8. You _don't_ want to
have the FS have different names in different character sets.

Oh, btw, HFS+ (Apple's new FS to replace HFS) does use Unicode filenames
for exactly this reason ...

MfG Kai