Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

From: Pali RohÃr
Date: Sat Dec 28 2019 - 15:05:37 EST


On Saturday 28 December 2019 12:14:57 Andreas Dilger wrote:
> > On Dec 28, 2019, at 7:36 AM, Pali RohÃr <pali.rohar@xxxxxxxxx> wrote:
> >
> > Hello!
> >
> > I see that you have introduced in commit 62750d0 two new IOCTLs for
> > filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL.
> >
> > I would like to ask, are these two new ioctls mean to be generic way for
> > userspace to get or set fs label independently of which filesystem is
> > used? Or are they only for btrfs?
> >
> > Because I was not able to find any documentation for it, what is format
> > of passed buffer... null-term string? fixed-length? and in which
> > encoding? utf-8? latin1? utf-16? or filesystem dependent?
>
> It seems that SETFSLABEL is supported by BtrFS and XFS, and GETFSLABEL
> also by GFS2. We were just discussing recently about adding it to ext4,
> so if you wanted to submit a patch for that it would be welcome.

I have in-progress patches for FAT with following ioctls:

#define FAT_IOCTL_GET_VOLUME_LABEL(len) _IOC(_IOC_READ, 'r', 0x14, len) /* variable length */
#define FAT_IOCTL_SET_VOLUME_LABEL _IOC(_IOC_WRITE, 'r', 0x15, 0) /* variable length, nul term string */

GET ioctl macro takes length of buffer and pass it via standard ioctl
bits into kernel would know how many bytes can copy to userspace.

And via SET ioctl is passed null term string, so kernel copy from
userspace whole null term string (with some upper limit to page size or
1024 bytes).

I plan to finish FAT patches in next month.

> It looks like the label is a NUL-terminated string, up to the length
> allowed by the various filesystems. That said, it seems like a bit of
> a bug that the kernel will return -EFAULT if a string shorter than the
> maximum size is supplied (256 chars for btrfs).

FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls now seems like a nice
general / unified approach, but it has problems which I already mention
in questions in previous email.

There are basically two kinds of filesystems when looking how filename /
label should be represented.

First group: file name / label is sequence of bytes (possible null
termined) without any representation. It can be in any encoding, it is
up to the user. There is no mapping to Unicode. Examples: ext4

Second group: file name / label is stored in exact encoding (defined
internally or externally by e.g. code page). There is exact translate
function to Unicode. For these file system there is either iocharset= or
nls= mount option which maps name / label to bytes. Examples: vfat
(UTF-16LE), ntfs (UTF-16LE), udf (Latin1 and UTF-16BE), iso9660 with
joliet (UTF-16LE), apfs (UTF-8).

All mentioned file systems by you are in first group, so you probably
have not hit this "implementation" problem for FS_IOC_GETFSLABEL yet.

And question is, what should FS_IOC_GETFSLABEL returns for file systems
from second group?

In my implementation for FAT_IOCTL_GET_VOLUME_LABEL I'm reading label
buffer from filesyetem, then converting this buffer from encoding
specified in codepage= mount option to Unicode and then from Unicode I'm
converting it to encoding specified in iocharset= mount option. Why?
Because same procedure is used for reading file names from vfat when
corresponding file name does not have long UTF-16LE vfat entry (label
on vfat really is not in UTF-16LE but in OEM code page). And I think
that volume label should be handled in same way like file names.

Is above behavior "label in ioctl would be retrieved in same encoding as
file names" correct / expected also for FS_IOC_GETFSLABEL?

Or do you have better idea how should filesystems from second group
implement FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL? (E.g. always in UTF-8
independently of mount options? Or in filesystem native encoding, so
sometimes in UTF-16LE, sometimes in UTF-16BE, sometimes in DOS OEM code
page, sometimes in Latin1? Or something different?)

> The copy_{to,from}_user() function will (I think) return the number of
> bytes remaining to be copied, so IMHO it would make sense that this is
> compared to the string length of the label, and not return -EFAULT if
> the buffer is large enough to hold the actual label. Otherwise, the
> caller has to magically know the maximum label size that is returned
> from any filesystem and/or allocate specific buffer sizes for different
> filesystem types, which makes it not very useful as a generic interface.

I think that my approach with specifying length in ioctl bits for GET
ioctl is more flexible. But I do not know if it is even possible to
change as API is already in kernel.

Also I see there more important bug in that API:

#define FSLABEL_MAX 256 /* Max chars for the interface; each fs may differ */

git show 62750d040bd13

"This retains 256 chars as the maximum size through the interface, which
is the btrfs limit and AFAIK exceeds any other filesystem's maximum
label size."

It has some artificial limit for volume label length. Which filesystems
was checked that maximal label length cannot be larger?

E.g. on UDF filesystem (supported by Linux kernel for a long time) is
label stored in 128bit buffer. First byte specifies Unicode encoding,
last byte specifies used length of buffer. Two possible encodings are
currently defined: Latin1 and UTF-16BE. Therefore if FS_IOC_GETFSLABEL
for UDF would use UTF-8 encoding, then maximal length for label is:
(128-2)*2 = 252 plus null byte (*2 because all non-ASCII Latin1
characters which are one-byte are encoded in UTF-8 by two bytes). Which
is almost upper limit of API. And this is just first filesytem which
I checked. I would not be surprised if there is already filesystem which
maximal length exceed current limit defined by kernel API.

--
Pali RohÃr
pali.rohar@xxxxxxxxx

Attachment: signature.asc
Description: PGP signature