[PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support

From: Andrew Gabbasov
Date: Fri Jan 15 2016 - 03:45:49 EST


V3:

Patches 1 and 2 skipped from sending since they are already accepted
by the maintainer (patch 2 with some changes comparing to V2).

Patches 3 - 5 rebased on top of updated patch 2.

Patch 6: Fixed a mistake in passing parameters to translate_to_linux():
the third buffer and length, used for CRC calculation, should be
passed without leading encoding character.

Patch 7: Main part of body of converting loops extracted to a separate
helper function. Also, some other modifications addressing maintainer's
comments to V2.

V2:

The single patch was split into several commits for separate logical
steps. Also, some minor fixes were done in the code of the patches.

V1:

Current implementation has several issues in unicode.c, mostly related
to handling multi-bytes characters in file names:

- loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
properly catch the end of output buffer in case of multi-bytes characters,
allowing out-of-bounds writing and memory corruption;

- udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
buffer at all, also allowing out-of-bounds writing and memory corruption;

- udf_translate_to_linux does not take into account multi-bytes characters
at all (although it is called after converting to UTF8 or NLS): maximal
length of extension is counted as 5 bytes, that may be incorrect with
multi-bytes characters; when inserting CRC and extension for long names
(near the end of the buffer), they are inserted at fixed place at the end,
that can break into the middle of the multi-bytes character;

- when being converted from CS0 to UTF8 (or NLS), the name can be truncated
(even if the sizes in bytes of input and output buffers are the same),
but the following translating function does not know about it and does not
insert CRC, as it is assumed by the specs.

Because of the last item above, it looks like all the checks and
conversions (re-coding and possible CRC insertions) should be done
simultaneously in the single function. This means that the listed
issues can not be fixed independently and separately. So, the whole
conversion and translation support should be reworked.

The proposed implementation below fixes the listed issues, and also has
some additional features:

- it gets rid of "struct ustr", since it actually just makes an unneeded
extra copying of the buffer and does not have any other significant
advantage;

- it unifies UTF8 and NLS conversions support, since there is no much
sense to separate these cases;

- UDF_NAME_LEN constant adjusted to better reflect actual restrictions.


Andrew Gabbasov (7):
udf: Prevent buffer overrun with multi-byte characters
udf: Check output buffer length when converting name to CS0
udf: Parameterize output length in udf_put_filename
udf: Join functions for UTF8 and NLS conversions
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Remove struct ustr as non-needed intermediate storage
udf: Merge linux specific translation into CS0 conversion function

fs/udf/namei.c | 16 +-
fs/udf/super.c | 38 ++--
fs/udf/udfdecl.h | 21 +-
fs/udf/unicode.c | 620 ++++++++++++++++++++++---------------------------------
4 files changed, 281 insertions(+), 414 deletions(-)

--
2.1.0