Breaking userspace? Re: 3.0.24 broke aufofs on mixed 32/64bit environment

From: Michael Tokarev
Date: Mon Apr 23 2012 - 14:07:22 EST


On 23.04.2012 20:55, Michael Tokarev wrote:
> On 23.04.2012 20:29, Michael Tokarev wrote:
>> On 23.04.2012 13:19, Ian Kent wrote:
>>> On Sun, 2012-04-22 at 19:22 +0400, Michael Tokarev wrote:
>>>> This is JFYI for now, I only diagnozed the issue but had
>>>> no time to see what's going on.
>>>>
>>>> The short story is that with 3.0.24 and later kernels,
>>>> 32bit autofs stopped working on 64bit kernels, while
>>>> it worked before just fine.
>>>>
>>>> The userspace is of version 5.0.4. Platform is x86.
>>>>
>>>> The sympthoms of the issue is that automount daemon,
>>>> on every access to a (not yet mounted) autofs mount
>>>> point, stalls forever, with the process trying to
>>>> access it stalling forever too -- this is why the
>>>> system didn't reboot, -- it stayed on an attempt to
>>>> access one of automount subdirs, somewhere in the
>>>> middle.
>>
>> This is what happens with 3.0.24+:
>>
>> [pid 15207] read(4, "\5\0\0\0\3\0\0\0\2\0\0\0\35\0\0\0@\17\246\3\0\0\0\0\350\3\0\0\350\3\0\0"..., 304) = 300
>> [pid 15207] read(4, <unfinished ...> -- forever.
>>
>> It wants to read 304 bytes, but kernel only supplies 300
>> bytes.
>>
>> $ file /usr/sbin/automount
>> /usr/sbin/automount: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
>> $ uname -a
>> Linux gandalf 3.0.0-amd64 #3.0.27 SMP Tue Apr 3 16:23:45 MSK 2012 x86_64 GNU/Linux
>>
>> This is a32744d4abae24572eff7269bc17895c41bd0085
>> "autofs: work around unhappy compat problem on x86-64",
>> included in 3.0.24:
>>
>> +static noinline size_t autofs_v5_packet_size(struct autofs_sb_info *sbi)
>> +{
>> + size_t pktsz = sizeof(struct autofs_v5_packet);
>> +#if defined(CONFIG_X86_64) && defined(CONFIG_COMPAT)
>> + if (sbi->compat_daemon > 0)
>> + pktsz -= 4;
>> +#endif
>> + return pktsz;
>> +}
>> +
>>
>> "End result: the "packed" size of the structure is 300 bytes: 4-byte, but
>> not 8-byte aligned.
>>
>> As a result, despite all the fields being in the same place on all
>> architectures, sizeof() will round up that size to 304 bytes on
>> architectures that have 8-byte alignment for u64."
>>
>> So I'm not sure how it worked before this change (from the
>> description of the patch it looks like it should not work).
>> But it definitely worked for me, and this patch broke it
>> for me.
>
> It worked because the userspace apparently does the "rigtht"
> think already compensating for the kernel bitness mismatch.
> This is a bit funky too, but it should work:
>
> static size_t get_kpkt_len(void)
> {
> size_t pkt_len = sizeof(struct autofs_v5_packet);
> struct utsname un;
>
> uname(&un);
>
> if (pkt_len % 8) {
> if (strcmp(un.machine, "alpha") == 0 ||
> strcmp(un.machine, "ia64") == 0 ||
> strcmp(un.machine, "x86_64") == 0 ||
> strcmp(un.machine, "ppc64") == 0)
> pkt_len += 4;
>
> }
>
> return pkt_len;
> }
>
> So it looks like the kernel/user interface had an error,
> userspace adopted to the kernel by doing something, so
> it started working. And next kernel adopted to the un-
> patched userspace, thus breaking patched userspace.
>
> That's quite messy... :( And I'm not sure what to do
> about this, -- the only solution - in my opinion anyway -
> is to revert this kernel patch and maybe switch to a
> new protocol version.

Okay, even if messy, there are quite easy pure userspace
solutions to all this.

But first of all, I want to question this change in the
first place, hence CC'ing to LKML too.

Kernel has been shipping with this brokeness for quite
some time, namely, since introduction of autofs4, dated
Mon Mar 27 01:14:55 2006 -0800 (commit 5c0a32fc2cd0).

This is 6 whole years by now.

Main userspace user of this interface adopted long ago
too, and has been in wide use for years as well.

This kernel change, even if right in theory, actually
breaks all existing userspace which was working fine
for years.

It looks like this is very much against main linux
principle of not breaking stuff even if the change is
"obviously right".

So it pretty much looks like this very fix has to be
reverted in mainline and in all stable kernels. At least
that's what I'd do if you ask me.


Now, back to "pure userspace" solution.

The kernel sends data to userspace, and the unit of
exchange is pretty small - this structure of either
300 or 304 bytes. Can the read from userspace EVER
be partial, so that a fullread() function is necessary?
I'm not sure, but to me it looks like this fullread()
loop isn't needed in the first place, and just single
read() (maybe repeating it in case of EINTR) should
do the job already. This way, we either handle the
data we've read, if we think the amount we read
is sufficient, and the packet looks sane, or just
error out. The filename is at the end of the struct,
so actually there's no need to send whole struct from
kernel, it is sufficient for the kernel to write up
to the trailing zero, and userspace _may_ be able to
understand (and no, this is not what userspace expects
currently, so changing kernel to do so will break things --
I'm just saying what can be done in _userspace_).

Now, we know that this sizeof(v5_packet) is designed
wrongly. But it is actually trivial for the userspace
to expect EITHER 300 bytes OR 304 bytes from the kernel --
again, all in one go. For it to work, extend the struct
with a padding at the end so it will be real 304 bytes,
and request reading of these 304 bytes, but handle the
read of 300 bytes too as successful.

This way, it will work for BOTH "bad" and "good" kernel.

Ofcourse it all breaks again if, in case of several
requests going from the kernel, kernel will batch them
into single larger read, so we'll lose packet boundary.
If that's the case, my "solution" does not work obviously.

Still, the userspace may adapt again, by doing a probe,
forcing the kernel to send us a request before making
the mountpoint visible in the final location, checking
the size of the data kernel sends, and setting that
pkt_len variable to this value. Ugly, but it will let
the application to work in either case.

But again, to me the current situation with outright
BREAKING of existing userspace is much uglier since
it affects many users instead of a single userspace
component...

Thanks,

/mjt

>> Should 3.3 work? As far as I can see, it includes the
>> same change (which has been backported!), so it may have
>> the same _issue_...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/