PKRU issue while using alternate signal stack

From: Aruna Ramakrishna
Date: Wed Feb 21 2024 - 15:01:53 EST


(Re-sending to the list, previous email had some formatting issues. I apologize.)

Hello,

We’re running into an issue with delayed PKRU update for signal handling, for which we don’t have a proposed solution yet.

Our use case is this:

The application has many threads that runs code that is deemed to be untrusted. Each thread has its stack/code protected by a non-zero pkey, and the PKRU register is set up such that only that particular non-zero pkey is enabled. Each thread also sets up an alternate signal stack to handle signals, which is protected by pkey zero. The pkeys man page documents that the PKRU will be reset to init_pku when the signal handler it is invoked, which means that pkey zero access is enabled, and so the alt sig stack is protected with pkey zero. But this reset happens after the kernel attempts to push fpstate to the alternate stack, which is not (yet) accessible by the kernel, which leads to a new SIGSEGV being sent to the application, terminating it.

This is the relevant snippet:

In handle_signal():

..
failed = (setup_rt_frame(ksig, regs) < 0); <- pkru reset should happen before this
if (!failed) {
/*
* Clear the direction flag as per the ABI for function entry.
*
* Clear RF when entering the signal handler, because
* it might disable possible debug exception from the
* signal handler.
*
* Clear TF for the case when it wasn't set by debugger to
* avoid the recursive send_sigtrap() in SIGTRAP handler.
*/
regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
/*
* Ensure the signal handler starts with the new fpu state.
*/
fpu__clear_user_states(fpu); <- pkru resets here, via pkru_write_default()
}
signal_setup_done(failed, ksig, stepping);
..

Failure path: setup_rt_frame() -> x64_setup_rt_frame() -> get_sigframe() -> copy_fpstate_to_sigframe() -> __clear_user() -> fails, with SIGSEGV and si_code set to SEGV_PKUERR.

The PKRU value is reset to the default (enabling pkey 0 only) in fpu__clear_user_states().

If the pkru_write_default() call were to move up the flow here, before copy_fpstate_to_sigframe(), then the signal handling would work as expected. But this code/flow is quite complicated, and we’d appreciate some expert opinion.

Thanks,
Aruna