Re: A Great Idea (tm) about reimplementing NLS.

From: Måns Rullgård
Date: Sat Jun 18 2005 - 18:24:14 EST


Lukasz Stelmach <stlman@xxxxxxxxx> writes:

> MÃns RullgÃrd napisaÅ(a):
>>>>What do you do if the underlying filesystem can not store some unicode
>>>>characters that are allowed on others?
>>>
>>>That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
>>>software that need not to be aware of unicodeness of the text it manages
>>>to handle it without any hickups *and* to store in the text information
>>>about multibyte characters.What characters exactly you do mean? NULL?
>>>There is no NULL byte in any UTF-8 string except the one which
>>>terminates it.
>>
>> That's exactly how ext3, reiserfs, xfs, jfs, etc. all work. A few
>> filesystems are tagged as using some specific encoding. If your
>> filesystem is marked for iso-8859-1, what should a kernel with a
>> conversion mechanism do if a user tries to name a file ê?
>
> Return -ENOENT? I am guessing.

Doesn't seem very friendly.

> But please tell me what should do userland software if it runs with
> locale set to something.iso-8859-2 and finds ê in the directory?

I suppose it will display ÄÅ (0x80 doesn't seem be a printable
iso-8859-2 character). You told it to use iso-8859-2 in the first
place, so what do you expect?

> That is the same problem. And for now ISO 8-bit encodings are far
> more popolar and usefull with contemporary tools than UTF-8.

ISO 8-bit encodings are more common with characters they can
represent. These are a small minority of all characters commonly
used.

> That is why I think suggestion of a layer in the kernel that would
> translate filenames form utf-8 stored on the media to e.g. latin2
> used by tools is quite reasonable. Especially when there is more
> than one encoding for a particular language (think Russian,
> Polish). Even more, with such a facility transition would be much
> more greaceful since you could have utf-8 filesystem and then you
> can worry about tools other tools. The filesystem is already
> populated with UFT-8 names.

How is the kernel to know what to translate to/from?

>>>>I think UDF is a better filesystem for many types of media since it is
>>>>able to me more gently to the sectors storing the meta data than VFAT
>>>>ever will be.
>>>I've tried cd packet writing with UDF and it gives insane overhead of
>>>about 20%. What metadata you'd like to store for example on your
>>>flashdrive or a floppy disk?
>>
>> Filename, timestamps, all the usual.
>
> That's why IMHO FAT is quite enough for this purpose.

FAT has a bad habit of constantly hammering the same sectors over and
over. This can wear out cheap flash media in no time at all.

--
MÃns RullgÃrd
mru@xxxxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/