Re: [PATCH] Two bugs in fs/isofs

From: Thomas Schmitt
Date: Wed Oct 21 2015 - 09:59:30 EST


Hi,

i wrote:
> > Truncation nowadays has to take into respect that UTF-8 may
> > consist of multiple bytes and should avoid to leave incomplete
> > byte sequences.
> > (Does the kernel have a function for this ?)

Jan Kara wrote:
> Well, such truncation function would have to be specific to encoding the fs
> uses.

But the problem of truncating a string that may contain multi-byte
UTF-8 characters is generic.

Rock Ridge gives no clue about the character set used with the names.
(libsofs can do via its SUSP protocol AAIP.)
Nowadays most unixly systems use UTF-8 anyways.
So if we truncate then we should avoid byte sequences which demand
more bytes to follow if interpreted as UTF-8.


> > The truncated names are not necessarily unique within the
> > directory.

> Well, true but is it worth the bother? I mean realistically, do people use
> media with more than 255 characters in a file name or is it mostly a
> theoretical concern?

One can easily produce such names with genisoimage.
libisofs refuses to produce more than 255 bytes name length.

It depends on the local filesystems whether such names can be
present in backup situations. Home user backup is my motivation
to care for ISO 9660 and optical drives.
So i had to implement qualified truncation in order to get the
minimum fidelity needed for backups.

I doubt anybody toggles 250+ bytes by hand. But in the three-byte
UTF-8 range, we get to the limit with less than 90 characters.
Also there may be automats with insane ideas about file naming.


The problem is that there will be no method to access the second
file of an identical name pair. One can study the behavior now
with two names of length 254 which differ only by bytes near
their end. The heavy truncation helps to create non-unique names.

One could use libisofs, e.g. via xorriso, to copy such inaccessible
files out of the ISO onto hard disk. (Provided my truncation method
is as good as i hope.)

The most simplistic way to get unique names would be mount(8) option
"norock". Then you get to see Joliet names or ISO 9660 names of
harmless length. But guessing the original name from an ISO 9660
name can then be an adventure of its own.
The MD5 suffix of libisofs would allow to compute the truncated
name from the known original name.


Have a nice day :)

Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/