Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

From: Edgecombe, Rick P
Date: Wed Jul 05 2023 - 14:45:59 EST

Next message: Liam R. Howlett: "[PATCH 2/2] mm/mmap: Change detached vma locking scheme"
Previous message: Gatien Chevallier: "[PATCH 10/10] ARM: dts: stm32: add ETZPC as a system bus for STM32MP13x boards"
In reply to: Szabolcs Nagy: "Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description"
Next in thread: Mark Brown: "Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@xxxxxxx wrote:
> Could you spell out what "the issue" is that can be triggered?
>
> i meant jumping back from the main to the alt stack:
>
> in main:
> setup sig alt stack
> setjmp buf1
>         raise signal on first return
>         longjmp buf2 on second return
>
> in signal handler:
> setjmp buf2
>         longjmp buf1 on first return
>         can continue after second return
>
> in my reading of posix this is valid (and works if signals are masked
> such that the alt stack is not clobbered when jumping away from it).
>
> but cannot work with a single shared shadow stack.

Ah, I see. To make this work seamlessly, you would need to have
automatic alt shadow stacks, and as we previously discussed this is not
possible with the existing sigaltstack API. (Or at least it seemed like
a closed discussion to me).

If there is a solution, then we are currently missing a detailed
proposal. It looks like further down you proposed leaking alt shadow
stacks (quoted up here near the related discussion):

On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@xxxxxxx wrote:
> maybe not in glibc, but a libc can internally use alt shadow stack
> in sigaltstack instead of exposing a separate sigaltshadowstack api.
> (this is what a strict posix conform implementation has to do to
> support shadow stacks), leaking shadow stacks is not a correctness
> issue unless it prevents the program working (the shadow stack for
> the main thread likely wastes more memory than all the alt stack
> leaks. if the leaks become dominant in a thread the sigaltstack
> libc api can just fail).

It seems like your priority must be to make sure pure C apps don't have
to make any changes in order to not crash with shadow stack enabled.
And this at the expense of any performance and memory usage. Do you
have some formalized priorities or design philosophy you can share?

Earlier you suggested glibc should create new interfaces to handle
makecontext() (makes sense). Shouldn't the same thing happen here? In
which case we are in code-changes territory and we should ask ourselves
what apps really need.

>
> > > we
> > > can ignore that corner case and adjust the model so the shared
> > > shadow stack works for alt stack, but it likely does not change
> > > the
> > > jump design: eventually we want alt shadow stack.)
> >
> > As we discussed previously, alt shadow stack can't work
> > transparently
> > with existing code due to the sigaltstack API. I wonder if maybe
> > you
> > are trying to get at something else, and I'm not following.
>
> i would like a jump design that works with alt shadow stack.

A shadow stack switch could happen based on the following scenarios:
1. Alt shadow stack
2. ucontext
3. custom stack switching logic

If we leave a token on signal, then 1 and 2 could be guaranteed to have
a token *somewhere* above where setjmp() could have been called.

The algorithm could be to search from the target SSP up the stack until
it finds a token, and then switch to it and INCSSP back to the SSP of
the setjmp() point. This is what we are talking about, right?

And the two problems are:
- Alt shadow stack overflow problem
- In the case of (3) there might not be a token

Let's ignore these problems for a second - now we have a solution that
allows you to longjmp() back from an alt stack or ucontext stack. Or at
least it works functionally. But is it going to actually work for
people who are using longjmp() for things that are supposed to be fast?
Like, is this the tradeoff people want? I see some references to fiber
switching implementations using longjmp(). I wonder if the existing
INCSSP loops are not going to be ideal for every usage already, and
this sounds like going further down that road.

For jumping out occasionally in some error case, it seems it would be
useful. But I think we are then talking about targeting a subset of
people using these stack switching patterns.

Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH and
POP shadow stack instructions? Can ARM just push a restore token at
setjmp time, like I was trying to figure out earlier with a push token
arch_prctl? It would be good to understand how ARM is going to
implement this with these differences in what is allowed by the HW.

If there are differences in how locked down/functional the hardware
implementations are, and if we want to have some unified set of rules
for apps, there will need to some give and take. The x86 approach was
mostly to not support all behaviors and ask apps to either change or
not enable shadow stacks. We don't want one architecture to have to do
a bunch of strange things, but we also don't want one to lose some key
end user value.

I'm thinking that for pure tracing users, glibc might do things a lot
differently (use of WRSS to speed things up). So I'm guessing we will
end up with at least one more "policy" on the x86 side.

I wonder if maybe we should have something like a "max compatibility"
policy/mode where arm/x86/riscv could all behave the same from the
glibc caller perspective. We could add kernel help to achieve this for
any implementation that is more locked down. And maybe that is x86's v2
ABI. I don't know, just sort of thinking out loud at this point. And
this sort of gets back to the point I keep making: if we need to decide
tradeoffs, it would be great to get some users to start using this and
start telling us what they want. Are people caring mostly about
security, compatibility or performance?

[snip]

Next message: Liam R. Howlett: "[PATCH 2/2] mm/mmap: Change detached vma locking scheme"
Previous message: Gatien Chevallier: "[PATCH 10/10] ARM: dts: stm32: add ETZPC as a system bus for STM32MP13x boards"
In reply to: Szabolcs Nagy: "Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description"
Next in thread: Mark Brown: "Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]