Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

From: Edgecombe, Rick P
Date: Sun Jul 02 2023 - 14:04:00 EST


On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@xxxxxxx wrote:
> The 06/22/2023 23:18, Edgecombe, Rick P wrote:
> > I'd also appreciate if you could spell out exactly which:
> >  - ucontext
> >  - signal
> >  - longjmp
> >  - custom library stack switching
> >
> > patterns you think shadow stack should support working together.
> > Because even after all these mails, I'm still not sure exactly what
> > you
> > are trying to achieve.

Hi Szablocs,

Thanks for writing all this up. It is helpful to understand where you
are coming from. Please don't miss my point at the very bottom of this
response.

>
> i'm trying to support two operations (in any combination):
>
> (1) jump up the current (active) stack.
>
> (2) jump to a live frame in a different inactive but live stack.
>     the old stack becomes inactive (= no task executes on it)
>     and live (= has valid frames to jump to).
>
> with
>
> (3) the runtime must manage the shadow stacks transparently.
>     (= portable c code does not need modifications)
>
> mapping this to c apis:
>
> - swapcontext, setcontext, longjmp, custom stack switching are jump
>   operations. (there are conditions under which (1) and (2) must
> work,
>   further details don't matter.)
>
> - makecontext creates an inactive live stack.
>
> - signal is only special if it executes on an alt stack: on signal
>   entry the alt stack becomes active and the interrupted stack
>   inactive but live. (nested signals execute on the alt stack until
>   that is left either via a jump or signal return.)
>
> - unwinding can be implemented with jump operations (it needs some
>   other things but that's out of scope here).
>
> the patterns that shadow stack should support falls out of this
> model.
> (e.g. posix does not allow jumping from one thread to the stack of a
> different thread, but the model does not care about that, it only
> cares if the target stack is inactive and live then jump should
> work.)
>
> some observations:
>
> - it is necessary for jump to detect case (2) and then switch to the
>   target shadow stack. this is also sufficient to implement it.
> (note:
>   the restore token can be used for detection since that is
> guaranteed
>   to be present when user code creates an inactive live stack and is
>   not present anywhere else by design. a different marking can be
> used
>   if the inactive live stack is created by the kernel, but then the
>   kernel has to provide a switch method, e.g. syscall. this should
> not
>   be controversial.)

For x86's shadow stack you can jump to a new stack without leaving a
token behind. I don't know if maybe we could make it a rule in the
x86_64 ABI that you should always leave a token if you are going to
mark the SHSTK elf bit. But if anything did this, then longjmp() could
never make it back to the stack where setjmp() was called without
kernel help.

>
> - in this model two live stacks cannot use the same shadow stack
> since
>   jumping between the two stacks is allowed in both directions, but
>   jumping within a shadow stack only works in one direction. (also
> two
>   tasks could execute on the same shadow stack then. and it makes
>   shadow stack size accounting problematic.)
>
> - so sharing shadow stack with alt stack is broken. (the model is
>   right in the sense that valid posix code can trigger the issue.

Could you spell out what "the issue" is that can be triggered?

> we
>   can ignore that corner case and adjust the model so the shared
>   shadow stack works for alt stack, but it likely does not change the
>   jump design: eventually we want alt shadow stack.)

As we discussed previously, alt shadow stack can't work transparently
with existing code due to the sigaltstack API. I wonder if maybe you
are trying to get at something else, and I'm not following.

>
> - shadow stack cannot always be managed by the runtime transparently:
>   it has to be allocated for makecontext and alt stack in situations
>   where allocation failure cannot be handled. more alarmingly the
>   destruction of stacks may not be visible to the runtime so the
>   corresponding shadow stacks leak. my preferred way to fix this is
>   new apis that are shadow stack compatible (e.g. shadow_makecontext
>   with shadow_freecontext) and marking the incompatible apis as such.
>   portable code then can decide to update to new apis, run with shstk
>   disabled or accept the leaks and OOM failures. the current approach
>   needs ifdef __CET__ in user code for makecontext and sigaltstack
>   has many issues.

This sounds reasonable to me on the face of it. It seems mostly
unrelated to the kernel ABI and purely a userspace thing.

>
> - i'm still not happy with the shadow stack sizing. and would like to
>   have a token at the end of the shadow stack to allow scanning. and
>   it would be nice to deal with shadow stack overflow. and there is
>   async disable on dlopen. so there are things to work on.

I was imagining that for tracing-only users, it might make sense to run
with WRSS enabled. This could mean libc's could write their own restore
tokens. In the case of longjmp() it could be simple and fast. The
implementation could just write a token at the target SSP and switch to
it. Non C runtimes that want to use if for backtracing could also write
their own preferred stack markers or other data. It also is whole
different solution to what is being discussed.

But over the course of this thread, I could imagine a little more now
how a top of stack marker could possibly be useful for non-tracing
usages. I have a patch prepared for this and I had tested to see if
adding this later could disturb anything in userspace. The only thing
that I found was that gdb might output a slightly different stack
trace. So it would be a user visible change, if not a regression.

One reason I held off on it still, is that the plan for the expanded
shadow stack signal frame includes using a 0 frame, to avoid a forgery
scenario. The token that makes sense for the end of stack marker is
also a 0 frame. So if userspace that looks for the end of stack marker
scans for the 0 frame without checking if it is part of an expanded
shadow stack signal frame, then it could make more trouble for alt
shadow stack.

So since they are tied together, and I thought to hold off on it for
now. I don't want to try to squeeze around the upstream userspace, I
think a version 2 should be a clean slate on a new elf bit.

>
> i understand that the proposed linux abi makes most existing binaries
> with shstk marking work, which is relevant for x86.
>
> for a while i thought we can fix the remaining issues even if that
> means breaking existing shstk binaries (just bump the abi marking).
> now it seems the issues can only be addressed in a future abi break.

Adding a new arch_prctl() ENABLE value was the plan. Not sure what you
mean by ABI break vs version bump. The plan was to add the new features
without userspace regression by putting any behavior behind a different
enable option. This relies on userspace to add a new elf bit, and to
use it.

>
> which means x86 linux will likely end up maintaining two incompatible
> abis and the future one will need user code and build system changes,
> not just runtime changes. it is not a small incremental change to add
> alt shadow stack support for example.
>
> i don't think the maintenance burden of two shadow stack abis is the
> right path for arm64 to follow, so the shadow stack semantics will
> likely become divergent not common across targets.

Unfortunately we are at a bit of an information asymmetry here because
the ARM spec and patches are not public. It may be part of the cause of
the confusion.

>
> i hope my position is now clearer.

It kind of sounds like you don't like the x86 glibc implementation. And
you want to make sure the kernel can support whatever a new solution is
that you are working on. I am on board with the goal of having some
generic set of rules to make portable code work for other architectures
shadow stacks. But I think how close we can get to that goal or what it
looks like is an open question. For several reasons:
1. Not everyone can see all the specs
2. No POCs have been done (or at least shared)
3. It's not clear what needs to be supported (yes, I know you have 
made a rough proposal here, but it sounds like on the x86 glibc 
side at least it's not even clear what non-shadow stack stack 
switching operations can work together)

But towards these goals, I think your technical requests are:

1. Leave a token on switching to an alt shadow stack. As discussed
earlier, we can't do this because of the overflow issues. Also since,
alt shadow stack cannot be transparent to existing software anyway, it
should be ok to introduce limitations. So I think this one is a no.
What we could do is introduce security weakening kernel helpers, but
this would make sense to come with alt shadow stack support.
2. Add an end token at the top of the shadow stack. Because of the
existing userspace restriction interactions, this is complicated to
evaluate but I think we *could* do this now. There are pros and cons.
3. Support more options for shadow stack sizing. (I think you are
referring to this conversation:
https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@xxxxxxx/). I don't see
why this is needed for the base implementation. If ARM wants to add a
new rlimit or clone variant, I don't see why x86 can't support it
later.

So if we add 2, are you satisfied? Or otherwise, on the non-technical
request side, are you asking to hold off on x86 shadow stack, in order
to co-develop a unified solution?

I think the existing solution will see use in the meantime, including
for the development of all the x86 specific JIT implementations.


And finally, what I think is the most important point in all of this:

I think that *how* it gets used will be a better guide for further
development than us debating. For example the main pain point that has
come up so far is the problems around dlopen(). And the future work
that has been scoped has been for the kernel to help out in this area.
This is based on _user_ (distro) requests.

Any apps that don't work with shadow stack limitations can simply not
enable shadow stack. You and me are debating these specific API
combinations, but we can't know whether they are actually the best
place to focus development efforts. And the early signs are this is NOT
the most important problem to solve.