Re: [PATCH] nls: add surrogate pair support in nls utf8.

From: Alan Stern
Date: Sun Dec 04 2011 - 12:08:42 EST


On Sun, 4 Dec 2011, Namjae Jeon wrote:

> It allows surrogate pair in nls utf8.

That's a pretty brief description.

> --- a/fs/nls/nls_utf8.c
> +++ b/fs/nls/nls_utf8.c
> @@ -30,13 +30,24 @@ static int char2uni(const unsigned char *rawstring, int boundlen, wchar_t *uni)
> {
> int n;
> unicode_t u;
> + u16 *op;
>
> + op = uni;
> n = utf8_to_utf32(rawstring, boundlen, &u);
> - if (n < 0 || u > MAX_WCHAR_T) {
> + if (n < 0 || u > UNICODE_MAX) {
> *uni = 0x003f; /* ? */
> return -EINVAL;
> }
> - *uni = (wchar_t) u;
> +
> + if (u >= PLANE_SIZE) {
> + u -= PLANE_SIZE;
> + *op++ = (wchar_t) (SURROGATE_PAIR |
> + ((u >> 10) & SURROGATE_BITS));
> + *op++ = (wchar_t) (SURROGATE_PAIR |
> + SURROGATE_LOW | (u & SURROGATE_BITS));
> + } else
> + *op++ = (wchar_t) u;
> +
> return n;
> }

Firstly, have you checked whether the callers of this function expect
to receive back more than one 16-bit value? Maybe you will overrun
their buffers by doing this.

Secondly, you shouldn't have to make all these changes. Just call
utf8s_to_utf16s(); then all you have to worry about is changing an
invalid character to a '?'.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/