Re: [PATCH v2] x86: Fix x32 System V message queue syscalls

From: Harald van Dijk
Date: Tue Aug 01 2023 - 08:13:54 EST

On 01/08/2023 03:53, Rich Felker wrote:
On Tue, Aug 01, 2023 at 02:38:47AM +0100, Jessica Clarke wrote:
On 1 Aug 2023, at 01:43, Harald van Dijk <harald@xxxxxxxxxxx> wrote:

On 06/12/2020 22:55, Andy Lutomirski wrote:
On Sat, Dec 5, 2020 at 4:01 PM Jessica Clarke <jrtc27@xxxxxxxxxx> wrote:

Can you submit patches implementing my proposal? One is your existing
patch plus fixing struct msghdr, with Cc: stable@xxxxxxxxxxxxxxx at
the bottom. The second is a removal of struct msghdr from uapi,
moving it into include/inux (no uapi) if needed. The second should
not cc stable.


This looks like it was forgotten, but it is still needed. Jessica,
are you interested in submitting the requested change? If not,
would it be okay if I do so? I have been running this locally for
a long time now.

Please feel free to; sorry that it dropped off my radar. Part of the
issue is my laptop no longer being x86, making it more annoying to test.

No worries and thanks, I will do so.

There is one complication that I think has not been mentioned yet:
when _GNU_SOURCE is defined, glibc does provide a definition of
struct msghdr in <sys/msg.h> with a field "__syscall_slong_t
mtype;". This makes it slightly more likely that there is code out
there in the wild that works fine with current kernels and would
be broken by the fix. Given how rare x32 is, and how rare message
queues are, this may still be acceptable, but I am mentioning it
just in case this would cause a different approach to be
preferred. And whatever is done, a fix should also be submitted to

Given POSIX is very clear on how msghdr works I think we have to break
whatever oddball code out there might be using this. The alternative is
violating POSIX in a way that makes correct code compile fine but fail
at run time on x32, which is a terrible place to be, especially when
the “fix” is to special-case x32 to go against what POSIX says. I just
can’t see how that’s a good place to stay in, even if something might
break when we fix this bug.

Absolutely. The application-facing API absolutely needs to have the
type of mtype be whatever long is in the application-facing C ABI.
However, I'm not sure how best to fix this.

I shall go with Andy's suggested approach. <>

A fix now still leaves
applications broken on all existing kernels in the wild.

True, but fixing it any other way also leaves applications broken on all existing kernels in the wild, and fixing it this way makes it so that existing applications that are currently broken start to work automatically once people move to new kernels, rather than requiring rebuilds.

This might be
a place where libc should have x32-specific translation code to work
around the wrong kernel ABI that became the contract with the kernel.

The problem is that there are two conflicting contracts, the de jure contract and the de facto contract. The de jure contract has always been that the field has type "long" and we have seen from the breakage that that is what applications have been using already. The de facto contract was different, but we do not know of any application that has made use of this. We cannot make it so that both work, so it makes sense to me to make it so that what we do know is out there works.

I'm not sure how practical this is, since it seems like it would
require a temp buffer. Is the message size sufficiently bounded to
make that reasonable? Should there me a new x32-specific syscall that
takes the right ABI so that translation is only needed on old kernels?

If a libc wishes to detect the current kernel behaviour and implement a workaround, can it technically not also do so without a new syscall by just issuing the syscall with a known payload and seeing what comes back?

But personally, I would be happy to leave that as it is now under Andy's rationale: "If you run user programs on a buggy kernel, you get buggy behavior..."

Harald van Dijk