Re: confusion and case problems: utf8 <-> iocharset

From: Andrey Borzenkov
Date: Thu Jul 13 2006 - 12:24:06 EST


Eduard Bloch wrote:

> Hello (to whom it may concern),
>
> I try to understand how the charset mapping with VFAT/Joliet and I found
> some inconsistencies between the user expectations, the docs, and the
> actuall behaviour.
>
> Users view:
>
> VFAT, NTFS and Joliet use a Unicode charset for storing the names
> internaly.

Nope. VFAT is using short name as long as it complies with MSDOS; this name
is stored in codepage character set.

[...]
>
> Second:
> there is the "utf8" option. How does that exactly differ from
> iocharset=utf8? There is not clear explanation in vfat.txt. What happens
> if you use both options, especially if iocharset!=utf8? Which one is
> prefered?
>

You actually need both; utf8 just says it is OK not to try to mangle names;
it does not substitute iocharset option.

> Third:
> how can I disable all that funny letter case conversions? They are not
> described anywhere properly,

man mount

> nor the way to disable them.

man mount

> IMO there are
> two problems:
>
> - what you write to the FS is not the same what "ls" shows you later.
> Eg. ABW becomes "abw" but "ABWÖ" becomes "ABWÖ". Abcd becomes "Abcd"
> but "ABC" becomes "abc". Does it make sense? NO.

tell this to Microsoft. It is how VFAT works. Although umlaut should
probably be considered as MSDOS-safe, at least with proper codepage option.


> I would like to stop the kernel playing such games, I had enough of
> such trouble back in my Windows 98 times.
>
> - this case conversion can actually break things. When iocharset=utf-8
> and utf8 are used, then you cannot access the data with the same
> name after storing it.
>

mount shortname=mixed is probably what you want.

> zombie:/tmp# mkdir test/TEST
> zombie:/tmp# ls test
> test
> zombie:/tmp# ls test/test
> zombie:/tmp# ls test/TEST
> ls: test/TEST: No such file or directory

{pts/1}% sudo mount -t vfat -o
loop,shortname=mixed,utf8,uid=bor /var/tmp/test.dos /tmp/x
{pts/1}% LC_ALL=C ll /tmp/x/test
total 2
drwxr-xr-x 2 bor root 2048 Jul 13 20:06 TEST/
{pts/1}% LC_ALL=C ll /tmp/x/test/TEST
total 0
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*

The filesystem is foobared already because it contains both capital and
lowercase versions of the same names. This is what happens when you try to
use wrong codepage.

-andrey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/