Re: [PATCH V3 1/6] x86_64: move cpu_current_top_of_stack out of TSS

From: Borislav Petkov
Date: Thu Feb 04 2021 - 11:48:18 EST


> Subject: Re: [PATCH V3 1/6] x86_64: move cpu_current_top_of_stack out of TSS

The tip tree preferred format for patch subject prefixes is
'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
'genirq/core:'. Please do not use file names or complete file paths as
prefix. 'git log path/to/file' should give you a reasonable hint in most
cases.

The condensed patch description in the subject line should start with a
uppercase letter and should be written in imperative tone.

Fix all your subjects pls.

On Thu, Jan 28, 2021 at 12:32:17AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> When X86_BUG_CPU_MELTDOWN & KPTI,

Please write that out properly.

> cpu_current_top_of_stack lives in the
> TSS which is also in the user CR3 and it becomes a coveted fruit. An
> attacker can fetch the kernel stack top from it and continue next steps
> of actions based on the kernel stack.
>
> The address might not be very usefull for attacker,

You just said it is a "coveted fruit" and now it is not very useful?!
Which is it?

Also, use spellchecker pls.

> but it is not so
> necessary to be in TSS either. It is only accessed when CR3 is kernel CR3
> and gs_base is kernel gs_base

You mean when at CPL0?

> which means it can be in any percpu variable.
>
> The major reason it is in TSS might be performance because it is hot in

"might be"?

> cache and tlb since we just access sp2 as the scratch space in syscall.

"TLB"

You can use the comment text directly as it is more precise:

"entry_SYSCALL_64 uses it as scratch space to stash the user RSP value."

>
> So we can move it to a percpu variable near other hot percpu variables,

Who's "we" ?

Also, from Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

> such as current_task, __preempt_count, and they are in the same
> cache line.
>
> tools/testing/selftests/seccomp/seccomp_benchmark desn't show any
> performance lost in "getpid native" result. And actually, the result
> changes from 93ns before patch to 92ns after patch when !KPTI, and the
> test is very stable although the test desn't show a higher degree of
> precision but enough to know it doesn't cause degression for the test.

This paragraph belongs ...

> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> ---

....

<--- here.

> arch/x86/include/asm/processor.h | 10 ----------
> arch/x86/include/asm/switch_to.h | 7 +------
> arch/x86/include/asm/thread_info.h | 6 ------
> arch/x86/kernel/cpu/common.c | 3 +++
> arch/x86/kernel/process.c | 7 +------
> arch/x86/mm/pti.c | 7 +++----
> 6 files changed, 8 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index c20a52b5534b..886d32da1318 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -314,11 +314,6 @@ struct x86_hw_tss {
> struct x86_hw_tss {
> u32 reserved1;
> u64 sp0;
> -
> - /*
> - * We store cpu_current_top_of_stack in sp1 so it's always accessible.
> - * Linux does not use ring 1, so sp1 is not otherwise needed.
> - */
> u64 sp1;
>
> /*
> @@ -428,12 +423,7 @@ struct irq_stack {
>
> DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
>
> -#ifdef CONFIG_X86_32
> DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
> -#else
> -/* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */
> -#define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
> -#endif
>
> #ifdef CONFIG_X86_64
> struct fixed_percpu_data {
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index 9f69cc497f4b..b5f0d2ff47e4 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -71,12 +71,7 @@ static inline void update_task_stack(struct task_struct *task)
> else
> this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
> #else
> - /*
> - * x86-64 updates x86_tss.sp1 via cpu_current_top_of_stack. That
> - * doesn't work on x86-32 because sp1 and
> - * cpu_current_top_of_stack have different values (because of
> - * the non-zero stack-padding on 32bit).
> - */
> + /* Xen PV enters the kernel on the thread stack. */

What's that comment here for?

> if (static_cpu_has(X86_FEATURE_XENPV))
> load_sp0(task_top_of_stack(task));
> #endif
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 0d751d5da702..3dc93d8df425 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -197,12 +197,6 @@ static inline int arch_within_stack_frames(const void * const stack,
> #endif
> }
>
> -#else /* !__ASSEMBLY__ */
> -
> -#ifdef CONFIG_X86_64
> -# define cpu_current_top_of_stack (cpu_tss_rw + TSS_sp1)
> -#endif
> -
> #endif
>
> #ifdef CONFIG_COMPAT
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 35ad8480c464..f3d7fd7e9684 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1745,6 +1745,9 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
> DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
> EXPORT_PER_CPU_SYMBOL(__preempt_count);
>
> +DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
> +EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);

... _GPL, of course.

Or lemme rephrase: who's going to cry if this export is _GPL?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette