Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test

From: Ingo Molnar
Date: Mon Sep 14 2015 - 04:15:48 EST



* Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:

> >> + /* INT80 syscall entrypoint can be used by
> >> + * 64-bit programs too, unlike SYSCALL/SYSENTER.
> >> + * Therefore it must preserve R12+
> >> + * (they are callee-saved registers in 64-bit C ABI).
> >> + *
> >> + * This was probably historically not intended,
> >> + * but R8..11 are clobbered (cleared to 0).
> >> + * IOW: they are the only registers which aren't
> >> + * preserved across INT80 syscall.
> >> + */
> >> + if (*r64 == 0 && num <= 11)
> >> + continue;
> >
> > Ugh. I'll change my big entry patchset to preserve these and maybe to
> > preserve all of the 64-bit regs.
>
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.

Argh, that's dangerous nonsense! You _still_ don't seem to understand what the
Linux ABI means and how to change code that implements it...

Let me try one more time, and please make the mental effort to understand and
internalize it, because I'm not going to apply any more ABI affecting patches from
you if you don't...

When you read 'ABI' you always need to write it out mentally:

_Application_ Binary Interface

Do you see the word 'kernel developer' in that phrase? I don't, and this really
matters: what an ABI is does not depend on what the kernel developers think or
thought it means. Our state of mind, our design preferences, our documentation and
our intentions are absolutely irrelevant.

What an ABI is purely depends on is whether applications break or change behavior
as a result of us changing it.

What an ABI depends on is how applications (pretty much any real application or
library, no matter how rare) learned to use the kernel's previous implementation
of the ABI, in released kernels - in every tiny detail of that ABI.

If _any_ application/library functionally relied on saving registers in this place
- even if that behavior was 'accidental' from the kernel developer's POV, and if
the app would break or misbehave if we changed that detail, then that's an
'accidental ABI', which is an ABI just as much!!!

This is why we preserve old quirks and this is why we try to be 110% careful
around ABIs, especially low level ones that expose a big set of hardware registers
that applications could rely on in unexpected ways.

Not because we couldn't "fix" them, but because an ABI is a long term contract: we
promise absolutely no breakage to applications, no matter what.

Mistaken ABIs, unintended ABIs, deterministically quirky ABIs that apps/libs
learned to rely on, even if the ABI usage in the app itself is just a bug or is
otherwise unintended, has _exactly_ the same ABI strength as our sys_close() ABI.

There are some very, very rare expections, but none of them apply here:

- If an ABI component behaved non-deterministically (relying on random stack
content for example) then we cannot always fully preserve it when fixing it -
but we'll generally gravitate towards trying to fix it while preserving
existing apps.

- Sometimes, 1 out of 100,000 kernel bugs, preserving an old, rare ABI would mean
introducing or allowing a serious security hole. There's been less than 3 such
cases in all of the 500,000+ commits in the Linux kernel that I'm aware of, so
this is not a case you need to even know about.

- Testcases whose only purpose is to bake in every single silly detail from an
exposed ABI generally don't count. Only if a real app/lib that people use is
affected counts. (There are exceptions even to this exception: if the ABI
change that breaks a test-case is obviously gratuitous then the ABI counts:
because reasonably there _might_ be apps out there that rely on the reasonable
old behavior, they just haven't been reported yet.) Obviously an ABI test-case
failing is a red flag, and by default we should assume we messed up and need to
unbreak the ABI - unless we are absolutely convinced it's OK. And even if we
think it's absolutely OK it takes just a single regression report of an
application breaking to revert an ABI change - no questions asked.

> (I'm not very comfortable about additional six push/pops which are necessary for
> this to happen. I'm surprised maintainers tentatively agreed to that - I was
> grilled and asked to prove with measurements that *one* additional push+pop
> wasn't adding significant overhead).

Yeah, so here's our thinking: the code is still not as maintainable as we wish it
to be, so we are willing to temporarily accept those pop/pushes to help along the
C conversion effort, to make the code (much!) more maintainable.

Other changes still have to be justified on an instruction granular manner.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/