Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

H. Peter Anvin (hpa@transmeta.com)
27 Aug 1997 06:34:04 GMT


Followup to: <Pine.SOL.3.95L.970826222613.8912B-100000@unix22.andrew.cmu.edu>
By author: Michael Poole <poole+@andrew.cmu.edu>
In newsgroup: linux.dev.kernel
>
> In my view, yes, for these reasons:
> - Filenames should contain an integral number of characters, even if an
> app tries to write a filename where the NAME_MAX-1'th byte isn't the last
> byte in the wide character. This is debatable -- one can argue back and
> forth all day whether or not that constitutes stupidity or misbehavior on
> the application's part, but in the end I think it will boil down to
> standards support or 'executive decision'.
>

This is a total lose. The kernel shouldn't *enforce* a certain
character set -- that is the kind of stupidity that Microsoft
operating systems get involved in. The kernel should treat a filename
as an anonymous sequence of bytes, except for '\0' (0x00) and '/'
(0x2f). The only exception should be to support foreign (non-UNIX)
filesystems.

> However, my personal belief is that there should be a policy in
> the kernel to only allow whole characters to be stored; in this case the
> kernel will need to know what encoding is used for file names. I strongly
> suspect that this won't be implemented, though, due to either standards
> compliance or for the benefit of supporting multiple encodings. There is
> a very strong case to be made that character delineation should be left to
> user-space, and if that's what prevails, so be it; I think that libc
> should be able to implement the policy just as well as the kernel.

Exactly; especially since most multibyte encodings will detect a
character that is truncated by a null byte. In particular, UTF-8 will
do so.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
Always looking for a few good BOsFH.  **  Linux - the OS of global cooperation
        I am Baha'i -- ask me about it or see http://www.bahai.org/