Re: [PATCH v9 10/13] exfat: add nls operations

From: Pali RohÃr
Date: Fri Jan 03 2020 - 04:40:35 EST


On Thursday 02 January 2020 16:20:33 Namjae Jeon wrote:
> This adds the implementation of nls operations for exfat.
>
> Signed-off-by: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
> Signed-off-by: Sungjong Seo <sj1557.seo@xxxxxxxxxxx>
> ---
> fs/exfat/nls.c | 809 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 809 insertions(+)
> create mode 100644 fs/exfat/nls.c
>
> diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
> new file mode 100644
> index 000000000000..af52328e28ff
> --- /dev/null
> +++ b/fs/exfat/nls.c

...

> +static int exfat_convert_uni_to_ch(struct nls_table *nls, unsigned short uni,
> + unsigned char *ch, int *lossy)
> +{
> + int len;
> +
> + ch[0] = 0x0;
> +
> + if (uni < 0x0080) {
> + ch[0] = uni;
> + return 1;
> + }
> +
> + len = nls->uni2char(uni, ch, MAX_CHARSET_SIZE);
> + if (len < 0) {
> + /* conversion failed */
> + if (lossy != NULL)
> + *lossy |= NLS_NAME_LOSSY;
> + ch[0] = '_';
> + return 1;
> + }
> + return len;
> +}

Hello! This function takes one UCS-2 character in host endianity and
converts it to one byte (via specified 8bit encoding).

> +static int __exfat_nls_uni16s_to_vfsname(struct super_block *sb,
> + struct exfat_uni_name *p_uniname, unsigned char *p_cstring,
> + int buflen)
> +{
> + int i, j, len, out_len = 0;
> + unsigned char buf[MAX_CHARSET_SIZE];
> + const unsigned short *uniname = p_uniname->name;
> + struct nls_table *nls = EXFAT_SB(sb)->nls_io;
> +
> + i = 0;
> + while (i < MAX_NAME_LENGTH && out_len < (buflen - 1)) {
> + if (*uniname == '\0')
> + break;
> +
> + len = exfat_convert_uni_to_ch(nls, *uniname, buf, NULL);
> + if (out_len + len >= buflen)
> + len = buflen - 1 - out_len;
> + out_len += len;
> +
> + if (len > 1) {
> + for (j = 0; j < len; j++)
> + *p_cstring++ = buf[j];
> + } else { /* len == 1 */
> + *p_cstring++ = *buf;
> + }
> +
> + uniname++;
> + i++;
> + }
> +
> + *p_cstring = '\0';
> + return out_len;
> +}
> +

This function takes UCS-2 buffer in host endianity and converts it to
string in specified 8bit encoding.

> +
> +int exfat_nls_uni16s_to_vfsname(struct super_block *sb,
> + struct exfat_uni_name *uniname, unsigned char *p_cstring,
> + int buflen)
> +{

Looking at the code and this function is called from dir.c to translate
exfat filename buffer stored in filesystem to format expected by VFS
layer.

On exfat filesystem file names are always stored in UTF-16LE...

> + if (EXFAT_SB(sb)->options.utf8)
> + return __exfat_nls_utf16s_to_vfsname(sb, uniname, p_cstring,
> + buflen);
> + return __exfat_nls_uni16s_to_vfsname(sb, uniname, p_cstring, buflen);

... and therefore above "__exfat_nls_uni16s_to_vfsname" function must
expect UTF-16LE buffer and not just UCS-2 buffer in host endianity.

So two other things needs to be done: Convert character from little
endian to host endianity and then process UTF-16 buffer and not only
UCS-2.

I see that in kernel NLS module is missing a function for converting
UTF-16 string to UTF-32 (encoding in which every code point is
represented just by one u32 variable). Kernel has only utf16s_to_utf8s()
and utf8_to_utf32().

> +}

Btw, have you tested this exfat implementation on some big endian
system? I think it cannot work because of missing conversion from
UTF-16LE to UTF-16 in host endianity (therefore UTF-16BE).

--
Pali RohÃr
pali.rohar@xxxxxxxxx