Re: [RFC/INCOMPLETE 00/13] x86: Rewrite exit-to-userspace code

From: Ingo Molnar
Date: Wed Jun 17 2015 - 07:05:04 EST



* Richard Weinberger <richard.weinberger@xxxxxxxxx> wrote:

> On Wed, Jun 17, 2015 at 11:48 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > * Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >
> >> This is incomplete, but it's finally good enough that I think it's
> >> time to get other opinions on it. It is a complete rewrite of the
> >> slow path code that handles exits to user mode.
> >
> > Modulo the small comments I made about the debug checks interface plus naming
> > details the structure and intention of this series gives me warm fuzzy feelings.
> >
> >> The exit-to-usermode code is copied in several places and is written in a nasty
> >> combination of asm and C. It's not at all clear what it's supposed to do, and
> >> the way it's structured makes it very hard to work with. For example, it's not
> >> even clear why syscall exit hooks are called only once per syscall right now.
> >> (It seems to be a side effect of the way that rdi and rdx are handled in the asm
> >> loop, and it seems reliable, but it's still pointlessly complicated.) The
> >> existing code also makes context tracking overly complicated and hard to
> >> understand. Finally, it's nearly impossible for anyone to change what happens
> >> on exit to usermode, since the existing code is so fragile.
> >
> > Amen.
> >
> >> I tried to clean it up incrementally, but I decided it was too hard. Instead,
> >> this series just replaces the code. It seems to work.
> >
> > Any known bugs beyond UML build breakage?
> >
> >> Context tracking in particular works very differently now. The low-level entry
> >> code checks that we're in CONTEXT_USER and switches to CONTEXT_KERNEL. The exit
> >> code does the reverse. There is no need to track what CONTEXT_XYZ state we came
> >> from, because we already know. Similarly, SCHEDULE_USER is gone, since we can
> >> reschedule if needed by simply calling schedule() from C code.
> >>
> >> The main things that are missing are that I haven't done the 32-bit parts
> >> (anyone want to help?) and therefore I haven't deleted the old C code. I also
> >> think this may break UML for trivial reasons.
> >>
> >> Because I haven't converted the 32-bit code yet, all of the now-unnecessary
> >> unnecessary calls to exception_enter are still present in traps.c.
> >>
> >> IRQ context tracking is still duplicated. We should probably clean it up by
> >> changing the core code to supply something like
> >> irq_enter_we_are_already_in_context_kernel.
> >>
> >> Thoughts?
> >
> > So assuming you fix the UML build I'm inclined to go for it, even in this
> > incomplete form, to increase testing coverage.
>
> Andy, can you please share the build breakage you're facing?
> I'll happily help you fixing it.

So they come in the form of:

./arch/um/include/shared/kern_util.h:25:12: error: conflicting types for âdo_signalâ

which comes from now x86 also having a do_signal().

The patch below fixes it by harmonizing the UML implementation with the x86 one.
This improves the UML side a bit, and fixes the build failure.

Thanks,

Ingo

=========================>
Subject: uml: Fix do_signal() prototype
From: Ingo Molnar <mingo@xxxxxxxxxx>
Date: Wed Jun 17 12:58:37 CEST 2015

Now that x86 exports its do_signal(), the prototypes clash.

Fix the clash and also improve the code a bit: remove the unnecessary
kern_do_signal() indirection. This allows interrupt_end() to share
the 'regs' parameter calculation.

Also remove the unused return code to match x86.

Minimally build and boot tested.

Cc: Richard Weinberger <richard.weinberger@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/um/include/shared/kern_util.h | 3 ++-
arch/um/kernel/process.c | 6 ++++--
arch/um/kernel/signal.c | 8 +-------
arch/um/kernel/tlb.c | 2 +-
arch/um/kernel/trap.c | 2 +-
5 files changed, 9 insertions(+), 12 deletions(-)

Index: tip/arch/um/include/shared/kern_util.h
===================================================================
--- tip.orig/arch/um/include/shared/kern_util.h
+++ tip/arch/um/include/shared/kern_util.h
@@ -22,7 +22,8 @@ extern int kmalloc_ok;
extern unsigned long alloc_stack(int order, int atomic);
extern void free_stack(unsigned long stack, int order);

-extern int do_signal(void);
+struct pt_regs;
+extern void do_signal(struct pt_regs *regs);
extern void interrupt_end(void);
extern void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs);

Index: tip/arch/um/kernel/process.c
===================================================================
--- tip.orig/arch/um/kernel/process.c
+++ tip/arch/um/kernel/process.c
@@ -90,12 +90,14 @@ void *__switch_to(struct task_struct *fr

void interrupt_end(void)
{
+ struct pt_regs *regs = &current->thread.regs;
+
if (need_resched())
schedule();
if (test_thread_flag(TIF_SIGPENDING))
- do_signal();
+ do_signal(regs);
if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
- tracehook_notify_resume(&current->thread.regs);
+ tracehook_notify_resume(regs);
}

void exit_thread(void)
Index: tip/arch/um/kernel/signal.c
===================================================================
--- tip.orig/arch/um/kernel/signal.c
+++ tip/arch/um/kernel/signal.c
@@ -64,7 +64,7 @@ static void handle_signal(struct ksignal
signal_setup_done(err, ksig, singlestep);
}

-static int kern_do_signal(struct pt_regs *regs)
+void do_signal(struct pt_regs *regs)
{
struct ksignal ksig;
int handled_sig = 0;
@@ -110,10 +110,4 @@ static int kern_do_signal(struct pt_regs
*/
if (!handled_sig)
restore_saved_sigmask();
- return handled_sig;
-}
-
-int do_signal(void)
-{
- return kern_do_signal(&current->thread.regs);
}
Index: tip/arch/um/kernel/tlb.c
===================================================================
--- tip.orig/arch/um/kernel/tlb.c
+++ tip/arch/um/kernel/tlb.c
@@ -291,7 +291,7 @@ void fix_range_common(struct mm_struct *
/* We are under mmap_sem, release it such that current can terminate */
up_write(&current->mm->mmap_sem);
force_sig(SIGKILL, current);
- do_signal();
+ do_signal(&current->thread.regs);
}
}

Index: tip/arch/um/kernel/trap.c
===================================================================
--- tip.orig/arch/um/kernel/trap.c
+++ tip/arch/um/kernel/trap.c
@@ -173,7 +173,7 @@ static void bad_segv(struct faultinfo fi
void fatal_sigsegv(void)
{
force_sigsegv(SIGSEGV, current);
- do_signal();
+ do_signal(&current->thread.regs);
/*
* This is to tell gcc that we're not returning - do_signal
* can, in general, return, but in this case, it's not, since
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/