Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

From: Topi Miettinen
Date: Mon Oct 26 2020 - 12:32:16 EST


On 26.10.2020 16.52, Catalin Marinas wrote:
On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:
On 23.10.2020 12.02, Catalin Marinas wrote:
On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
Regardless, it makes sense to me to have the kernel load the executable
itself with BTI enabled by default. I prefer gaining Catalin's suggested
patch[2]. :)
[...]
[2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/

I think I first heard the idea at Mark R ;).

It still needs glibc changes to avoid the mprotect(), or at least ignore
the error. Since this is an ABI change and we don't know which kernels
would have it backported, maybe better to still issue the mprotect() but
ignore the failure.

What about kernel adding an auxiliary vector as a flag to indicate that BTI
is supported and recommended by the kernel? Then dynamic loader could use
that to detect that a) the main executable is BTI protected and there's no
need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
pages.

We could add a bit to AT_FLAGS, it's always been 0 for Linux.

Great!

In absence of the vector, the dynamic loader might choose to skip doing
PROT_BTI at all (since the main executable isn't protected anyway either, or
maybe even the kernel is up-to-date but it knows that it's not recommended
for some reason, or maybe the kernel is so ancient that it doesn't know
about BTI). Optionally it could still read the flag from ELF later (for
compatibility with old kernels) and then do the mprotect() dance, which may
trip seccomp filters, possibly fatally.

I think the safest is for the dynamic loader to issue an mprotect() and
ignore the EPERM error. Not all user deployments have this seccomp
filter, so they can still benefit, and user can't tell whether the
kernel change has been backported.

But the seccomp filter can be set to kill the process, so that's definitely not the safest way. I think safest is that when the AT_FLAGS bit is seen, ld.so doesn't do any mprotect() calls but instead when mapping the segments, mmap() flags are adjusted to include PROT_BTI, so mprotect() calls are not necessary. If there's no seccomp filter, there's no disadvantage for avoiding the useless mprotect() calls.

I'd expect the backported kernel change to include both aux vector and also using PROT_BTI for the main executable. Then the logic would work with backported kernels as well.

If there's no aux vector, all bets are off. The kernel could be old and unpatched, even so old that PROT_BTI is not known. Perhaps also in the future there may be new technologies which have replaced BTI and the kernel could want a previous generation ld.so not to try to use BTI, so this could be also indicated with the lack of aux vector. The dynamic loader could still attempt to mprotect() the pages, but that could be fatal. Getting to the point where the error can be ignored means that there's no seccomp filter, at least none set to kill. Perhaps the pain is only temporary, new or patched kernels should eventually replace the old versions.

Now, if the dynamic loader silently ignores the mprotect() failure on
the main executable, is there much value in exposing a flag in the aux
vectors? It saves a few (one?) mprotect() calls but I don't think it
matters much. Anyway, I don't mind the flag.

Saving a few system calls is indeed not an issue, but not being able to use MDWX and PROT_BTI simultaneously was the original problem (service failures).

-Topi