Re: CET shadow stack app compatibility

From: Edgecombe, Rick P
Date: Mon Dec 05 2022 - 14:03:09 EST


On Fri, 2022-12-02 at 19:48 +0100, Florian Weimer wrote:
> * Rick P. Edgecombe:
>
> > For IBT, which seems to be in worse shape than shadow stack from an
> > existing userspace perspective, I have also seen shared objects
> > with
> > issues.
> >
> > For shadow stack, it was just JITing binaries.
>
> Except that the actual JITters are usually in shared objects, too,
> and
> you just assume here that they get loaded by a main program from the
> same build. 8-) I think most of them are reusable independently, or
> are
> bundled into applications built with a different toolchain.

So I guess the situation must be a SHSTK2 binary dlopen()s a
broken SHSTK1 DSO (broken because of JITing or whatever) using a future
version of glibc. It would depend on how the future implementation of
SHSTK2 in glibc would handle this. I can only hope glibc would do the
right thing to avoid whatever situation caused the creation of SHSTK2.

If the scenario is SHSTK1 binary dlopen()s a broken SHSTK1 DSO, it
would already not have shadow stack because SHSTK1 was blocked from
getting shadow stack enabled.

>
> > Of course if glibc is compiled in non-permissive mode there is an
> > additional category of issues around dlopen()ing that we haven't
> > even
> > discussed yet. And the past issues around makecontext() we have
> > already worked around from the kernel. If you are aware of any
> > other
> > specific compatibility problems, please share so we can discuss the
> > extent.
>
> H.J. ran most of the experiments on Fedora. We did some early
> validation many years ago, using the first ABI iteration. We didn't
> have as much reach as we liked in terms of hardening at the time, if
> I
> recall correctly, but there were only very few cases where something
> did
> not work and was also not marked as incompatible.

I think most binaries will work automatically. The problem is the
standard is not "doesn't break *too many* binaries".

>
> > > The posted hack didn't even
> > > deal with that case. If the main executable has the current
> > > markers,
> > > the kernel will not disable shadow stack, and the process will
> > > still
> > > crash after loading the incorrectly marked shared object.
> >
> > The proposed glibc changes would not enable shadow stack unless the
> > execing binary has the elf bit marked. So if we block those
> > binaries
> > (which the kernel can easily check) from enabling shadow stack,
> > none of
> > the linked shared objects will have shadow stack either. So I think
> > we
> > are ok to hold this in our back pocket to resolve the known issues
> > if
> > anyone complains.
>
> See above, the assumption that the JITter and the main program come
> from
> the same build that is implicit in this is not actually true in
> practice.

Hmm, not sure I understand your point. Are you saying that the kernel
can't resolve the found issues by blocking SHSTK1 execing binaries? I
think it can by depending on nice future glibc behavior.

In general, the point that the kernel can't fully stop userspace from
breaking itself is well taken.

>
> > Where the shared objects could come into play is, in the event that
> > we
> > have to block the old elf bit from the kernel, and a new one is
> > properly marked on a new executable, future glibcs could decide to
> > honor the old bits when checking shared libraries. So you could
> > have an
> > executable with SHSTK2 bit loading a problem SO with just SHSTK1
> > bit.
>
> Right. But we can also have policies in userspace to paper over
> this.
> I'm not worried about it. I want to see how far we can get before
> making the flip in an upstream version of glibc, but if the kernel
> enforces SHSTK2 (even just on executables), I need a toolchain update
> plus a rebuild of a large chunk of the distribution.

The existing gcc's assume wrong ABI as well, so it's probably safest to
use an updated toolchain in any case. I wasn't able to find any
binaries that broke because of the GCC issues, but it wasn't an
exhaustive search.

But remember, even that filter patch had a Kconfig to disable it.
Distros with the resources to test everything on SHSTK hardware and
users that don't build their own glibcs could probably minimize the
impact. But smaller distros or users could at least not be surprised or
wait for SHSTK2 to make its way through.

>
> So with reusing SHSTK1 markup, it goes like this:
>
> 1. Get a Fedora rawhide kernel with userspace SHSTK support.
> 2. Get the glibc patches from H.J., and gate them behind a tunable
> (off by default). Kernel behavior should not change with this
> new glibc because the required arch_prctl does not happen
> (and the old ones currently in glibc have different numbers).
> 3. Run the Fedora graphical desktop with the tunable switched on and
> a few key
> third-party applications to see where we stand in terms of
> compatibility.
> 3b Do the same thing with RHEL and some enterprise applications
> (using the kernel and glibc from 1 & 2 for a start).
> 4. (Optional.) Flip the default of the tunable to on.
>
> I don't know how quickly we can get past step 1, but it seems fairly
> soon, maybe three months, considering the upcoming end-of-year break.
>
> With SHSTK2 markup required by the kernel, it goes like this:
>
> 1. Get a Fedora rawhide kernel with userspace SHSTK support.
> 2. Get a SHSTK2-enabled toolchain. GCC is currently freezing for the
> 13
> release, so this is not a good time of the year for that. It's
> probably going to be a custom compiler, unless we want to wait a
> couple of months, and even then it's got to be a downstream-only
> backport at first because to upstream, this will have a “not
> finished” whiff (it's the umpteenth ABI change).
> 3. Get the glibc patches from H.J. We would probably put it behind
> a tunable as well.
> 4. Rebuild key parts of Fedora, probably directly in rawhide (the
> rolling integration distribution).
> 5. Run the Fedora rawhide graphical desktop etc.
> 6. RHEL testing will require a SHSTK2 port to a different compiler
> and another mass rebuild. ISV application testing is not
> meaningful
> until the ISVs have switched to a newer compiler.
>
> That's going to take much longer than three months. Maybe we have to
> do
> this in the end, but even then, we have no way of forcing developers
> to
> test on SHSTK-capable hardware on new-enough before turning on the
> SHSTK2 bit.
>
> In the end, we might still need SHSTK2, but we don't know that yet,
> and
> the first approach is quite cheap, so I really want to try it.

Yes, this is the working plan at this point. I removed the elf header
bit filter in the latest revision. I still personally would favor
starting over with SHSTK2 from the beginning, even if it led to slower
roll out. That would be a feature, not a bug, in my view.

If we do end up needing SHSTK2 though, then it resets the clock and the
rollout is the slowest of the possibilities.

>
> Keep in mind that just because some useful interface is provided by
> the
> kernel, we can't necessarily use it in glibc immediately because with
> all those seccomp filters out there (and other dependencies on
> internal
> glibc/kernel interface details), too much would break if we exposed
> it
> into existing applications without some coordination. SHSTK isn't
> *that* different, except that we have some binary markup to guide us
> at
> run time.

The thing that is rare is that the way that is has been rolled out
restricts existing behavior under the nose of the application
developers AND it depends on kernel/HW support. In the analogy of
forced compiler hardening options, as best I can tell (I'm educating
myself on this history only recently), larger distros started doing
this and found and fixed the issues. Then smaller ones picked it up
after that.

With shadow stack, we seem to be well down this path already because of
the lack of kernel support.

>
> > But I still don't see why doing the order:
> > 1. kernel support
> > 2. libc support
> > 3. compiler support
> >
> > ...wouldn't have generated a more normal situation where old
> > binaries
> > don't break against new kernels and testing can easily happen to
> > reduce
> > issues further. So we could still reset and do exactly that.
>
> No matter in which order you do it, some group will want to change
> ABI
> or semantics. We actually had multiple different iterations in
> different orders, and everybody wanted to put their mark onto this
> feature, changing the ABI. I don't care at all about the internal
> ABI
> between glibc and the kernel, but the markup of the binaries (besides
> glibc itself) is quite important to me.

I'm late to this project, but for my changes to the enablement ABI I
really had no choice. I preferred SHSTK2 to resolve the boot problems
too and we did this other ABI change after extreme resistance from the
glibc side. So it was really trying to prevent an insta-revert rather
then putting any marks on anything.

Whatever the spec, we really need to prevent compatibility sensitive
features like this from making it upstream in userspace before the
kernel changes. The kernel has high backwards compatibility standards.
To try to achieve this, it should have flexibility to design its own
ABI. Putting the userspace changes upstream ahead of time for a feature
like this constrains the kernel.

The idea that userspace can finalize on all the bits and ABI for future
features and then wait lurking to cause kernel regressions if the
kernel doesn't match is wrong. It also caused these concrete issues. So
hopefully everyone is on the same page about this for the future. Just
want to be clear in case.

>
> In retrospect, separating SHSTK from IBT from the start would have
> helped a lot because I think we could have done that in libc without
> compiler support. But I don't think anyone expected this to take
> four
> to five years to implement (or probably longer for IBT).
>
> > > Instead, we'd have to
> > > wait for a rebuild with the new markers, and of course this
> > > rebuild
> > > will
> > > put is in exactly the same position as before: the
> > > incompatibilities
> > > will be back because they are no longer masked by the kernel.
> >
> > People building new apps and testing them against upstream kernels
> > and
> > finding issues sounds like business as usual. I'm not trying to
> > solve
> > all possible userspace mistakes from the kernel.
>
> They also have to test on the right hardware and with a
> new/unreleased
> glibc.
>
> I think it would be helpful to those developers if we could give them
> an
> existing distribution early on they can use for experiments. Not
> just
> getting SHSTK going, but also playing with the perf integration
> (which
> to me is the real goal here).
>
>

Agreed. A Kconfig or sysctl would have worked fine for this purpose
though.