[RFC PATCH v3 0/8] Per process PTI activation

From: Willy Tarreau
Date: Wed Jan 10 2018 - 14:29:29 EST


This is the third version of the proposal.

Consecutive to the discussions, I went back to using thread_info flags
as they may be cheaper to check than the per-CPU variable (being hot in
the cache) and will later make it possible to allow specific threads to
re-enable protection if desired (not supported yet as I'm not totally
sure of all possible impacts yet).

The prctl is now conditionned by :
- a config option : CONFIG_PER_PROCESS_PTI
- a sysctl : pti_adjust, which takes 3 values :
- 0 (default) : changes to PTI are not permitted
- 1 : changes to PTI are permitted
- -1 : like zero but cannot be changed anymore

This ensures that users running untrusted code can disable the support
at build time, and that distros can leave it to the admin to decide to
enable it, or to block it until next reboot.

There are now two prctls :
- ARCH_DISABLE_PTI_NOW to disable PTI for the current process. It
checks that mm_users <= 1 before proceeding, and only acts if
pti_adjust == 1. It's cleared on execve().

- ARCH_DISABLE_PTI_NEXT to disable PTI after the next execve(). It
doesn't change the current process' state and will only set
ARCH_DISABLE_PTI_NOW after the next execve() and be cleared. It's
made for wrappers. I lazily copied the same checks as the first one
so it also checks that mm_users <= 1, though it doesn't really make
sense, but since it's present only to create simple wrappers, it's
unlikely that such a simple wrapper will be called with threads.
I'm not seeing any particular risks in removing this test though.

The GET prctl was dropped as useless (was there just as a debugging aid).

What remains to be done :
- _PAGE_NX is still commented out for now. I'll need some help here
if we have to catch a page fault to deal with it. Ingo apparently
suggested that probably it doesn't bring any value anymore on modern
systems with SMEP.

- I haven't yet added the other values for the system-wide boot options

- nothing done on tainting yet

- documentation

I now find the solution really convenient to use and reassuring at the
same time, being disabled by default and with the ability to disable it
forever at runtime. I think we really are on a good balance here.

I'm interested in feedback, to know if it's worth pursuing that direction.

I wrote this quick-n-dirty test program for it serving both as a wrapper
and as a benchmark tool to quickly tell if it works or not (performs 3
million write()). I tested all combinations of NOW and NEXT with the
various sysctl values and everything works as expected.

#include <asm/prctl.h>
#include <sys/prctl.h>
#include <stdio.h>

#ifndef ARCH_DISABLE_PTI_NOW
#define ARCH_DISABLE_PTI_NOW 0x1021
#endif

#ifndef ARCH_DISABLE_PTI_NEXT
#define ARCH_DISABLE_PTI_NEXT 0x1022
#endif

int main(int argc, char **argv)
{
if (argc < 2) {
printf("usage: nopti [0|1|2|3] [<cmd> ...]\n");
return 1;
}

if (argv[1][0] & 1)
if (arch_prctl(ARCH_DISABLE_PTI_NOW, 1) == -1)
printf("failed PTI_NOW\n");

if (argv[1][0] & 2)
if (arch_prctl(ARCH_DISABLE_PTI_NEXT, 1) == -1)
printf("failed PTI_NEXT\n");

argv += 2;
argc -= 2;

if (!argc) {
/* run a local loop */
long loops = 3000000;

while (loops--)
write(-1, "a", 1);
return 0;
}

return execvp(argv[0], argv);
}


Tests gave me this in a PCID-enabled VM :

# time ./nopti 3
failed PTI_NOW
failed PTI_NEXT

real 0m0.924s ==> PTI enabled
user 0m0.295s
sys 0m0.627s

# echo 1 > /proc/sys/vm/pti_adjust
# time ./nopti 1

real 0m0.220s ==> PTI disabled
user 0m0.104s
sys 0m0.116s

# time ./nopti 1 ./nopti 0

real 0m0.918s ==> PTI enabled in target process
user 0m0.276s
sys 0m0.640s

# time ./nopti 2

real 0m0.906s ==> PTI enabled
user 0m0.280s
sys 0m0.625s

# time ./nopti 2 ./nopti 0

real 0m0.207s ==> PTI disabled in target process
user 0m0.076s
sys 0m0.131s

# time ./nopti 3

real 0m0.216s
user 0m0.068s
sys 0m0.148s

# su admin
$ id
uid=100(admin) gid=4(adm) groups=4(adm)
$ ./nopti 3
failed PTI_NOW
failed PTI_NEXT

# echo 0 > /proc/sys/vm/pti_adjust
# time ./nopti 1
failed PTI_NOW

real 0m0.875s ==> PTI enabled
user 0m0.308s
sys 0m0.567s

# echo -1 > /proc/sys/vm/pti_adjust
# ./nopti 3
failed PTI_NOW
failed PTI_NEXT

# echo 0 > /proc/sys/vm/pti_adjust
-su: echo: write error: Operation not permitted
# echo 1 > /proc/sys/vm/pti_adjust
-su: echo: write error: Operation not permitted

Willy

Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>


Willy Tarreau (8):
x86/thread_info: add TIF_DISABLE_PTI_{NOW,NEXT} to disable PTI per
task
x86/pti: add new config option PER_PROCESS_PTI
x86/pti: create the pti_adjust sysctl
x86/arch_prctl: add ARCH_DISABLE_PTI_{NOW,NEXT} to enable/disable PTI
exec: take care of disabling PTI upon execve()
x86/pti: don't mark the user PGD with _PAGE_NX.
x86/entry/pti: avoid setting CR3 when it's already correct
x86/entry/pti: don't switch PGD when TIF_DISABLE_PTI_NOW is set

arch/x86/entry/calling.h | 27 +++++++++++++++++++++++++++
arch/x86/include/asm/pti.h | 5 +++++
arch/x86/include/asm/thread_info.h | 13 +++++++++++++
arch/x86/include/uapi/asm/prctl.h | 3 +++
arch/x86/kernel/process_64.c | 30 ++++++++++++++++++++++++++++++
arch/x86/mm/pti.c | 22 ++++++++++++++++++++++
fs/exec.c | 10 ++++++++++
kernel/sysctl.c | 12 ++++++++++++
security/Kconfig | 12 ++++++++++++
9 files changed, 134 insertions(+)

--
1.7.12.1