Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

From: Ingo Molnar
Date: Thu Apr 23 2015 - 02:24:50 EST



* Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote:

> On Wed, Apr 22, 2015 at 11:22:02AM -0700, Linus Torvalds wrote:
> > On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > > this
> >
> > Ugh, I absolutely detesrt that patch.
> >
> > Don't make random crazy function signatures that depend on some config
> > option. That's just evil. The patch is a mess of #ifdef's and should
> > be shot in the head and staked with a silver stake to make sure it
> > never re-appears.
> >
> > Either:
> >
> > (a) make the change for every architecture
> >
> > (b) have side-by-side interfaces. With different names!
>
> ...that's exactly what I did. They're called copy_thread and
> copy_thread_tls; I very intentionally did not conditionally change
> the signature of copy_thread, for exactly that reason. Those
> functions are implemented in architecture-specific code, so the
> config option just specifies which of the two functions the
> architecture provides.
>
> *sys_clone* has different function signatures based on config
> options, but I didn't touch that other than fixing the type of the
> tls argument. That's historical baggage that we can't throw away
> without breaking userspace.

So you want to add a new clone() parameter. I strongly suspect that
you won't be the last person who wants to do this.

So why not leave the compatibility baggage alone and introduce a new
clean clone syscall with a flexible, backwards and forwards compatible
parameter ABI?

Something like:

SYSCALL_DEFINE1(clone_params, struct clone_params __user *, params)

struct clone_params {
__u32 size; /* User-space sets it to sizeof(struct clone_params) */

__u32 tls_val; /* Slightly out of order, for optimal structure packing */
__u64 clone_flags;
__u64 new_sp_addr;
__u64 parent_tid_addr;
__u64 child_tid_addr;
__u64 tls;
};

The only real cost of this approach is that this parameter structure
has to be copied (once): should be fast as it fits into a single cache
line and is a constant size copy. Also note how this parameter block
can be passed down by const reference from that point on, instead of
copying/shuffling 5-6 parameters into increasingly long function
parameter lists. So it might in fact turn out to be slightly faster as
well.

Note how easily extensible it is: a new field can be added by
appending to the structure, and compatibility is achieved in the
kernel without explicit versioning, by checking params->size:

params->size == sizeof(*params):

Good, kernel and userspace ABI version matches. This is a simple
check and the most typical situation - it will be the fastest as
well.


params->size < sizeof(*params):

Old binary calls into new kernel. Missing fields are set to 0.
Binaries will be forward compatible without any versioning hacks.


params->size > sizeof(*params):

New user-space calls into old kernel. System call can still be
serviced if the new field(s) are all zero. We return -ENOSYS if
any field is non-zero. (i.e. if new user-space tries to make use
of a new syscall feature that this kernel has not implemented
yet.) This way user-space can figure out whether a particular
new parameter is supported by a kernel, without having to add new
system calls or versioning.

Also note how 'fool proof' this kind of 'versioning' is: the parameter
block is extended by appending to the end, cleanly extending the
kernel internals to handle the new parameter - end of story. The
zeroing logic on size mismatch will take care of the rest
automatically, it's all ABI forwards and backwards compatible to the
maximum possible extent.

It's relatively easy to use this same interface from 32-bit compat
environments as well: they can use the very same parameter block, the
kernel's compat layer possibly just needs to do a minor amount of
pointer translation for the _addr fields, for example to truncate them
down to 32 bits for security, by filling in the high words of pointers
with 0. (This too can be optimized further if needed, by open coding
the copy and zeroing.) This too is forwards and backwards ABI
compatible to the maximum possible extent.

So introduce a single new syscall with the right ABI architecture and
you can leave the old mistakes alone.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/