Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads

From: Linus Torvalds
Date: Sun Jun 13 2021 - 18:19:26 EST


On Sun, Jun 13, 2021 at 2:55 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> The alpha_switch_to will remove the extra registers from the stack and
> then call ret which if I understand alpha assembly correctly is
> equivalent to jumping to where $26 points. Which is
> ret_from_kernel_thread (as setup by copy_thread).
>
> Which leaves ret_from_kernel_thread and everything it calls without
> the extra context saved on the stack.

Uhhuh. Right you are, I think. It's been ages since I worked on that
code and my alpha handbook is somewhere else, but yes, when
alpha_switch_to() has context-switched to the new PCB state, it will
then pop those registers in the new context and return.

So we do set up the right stack frame for the worker thread, but as
you point out, it then gets used up immediately when running. So by
the time the IO worker thread calls get_signal(), it's no longer
useful.

How very annoying.

The (obviously UNTESTED) patch might be something like the attached.

I wouldn't be surprised if m68k has the exact same thing for the exact
same reason, but I didn't check..

Linus
arch/alpha/kernel/process.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5112ab996394..edbfe03f4b2c 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -251,8 +251,17 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,

if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
/* kernel thread */
+ /*
+ * Give it *two* switch stacks, one for the kernel
+ * state return that is used up by alpha_switch_to,
+ * and one for the "user state" which is accessed
+ * by ptrace.
+ */
+ childstack--;
+ childti->pcb.ksp = (unsigned long) childstack;
+
memset(childstack, 0,
- sizeof(struct switch_stack) + sizeof(struct pt_regs));
+ 2*sizeof(struct switch_stack) + sizeof(struct pt_regs));
childstack->r26 = (unsigned long) ret_from_kernel_thread;
childstack->r9 = usp; /* function */
childstack->r10 = kthread_arg;