Re: [tip: x86/entry] x86/entry: Avoid redundant CR3 write on paranoid returns

From: Brendan Jackman
Date: Mon Feb 19 2024 - 05:50:08 EST


[Apologies if you see this as a duplicate, accidentally sent the
original in HTML, please disregard the other one]

Hi Thomas,

I have just noticed that the commit has disappeared from
tip/x86/entry. Is that deliberate?

Thanks,
Brendan


On Wed, 24 Jan 2024 at 19:36, tip-bot2 for Lai Jiangshan
<tip-bot2@xxxxxxxxxxxxx> wrote:
>
> The following commit has been merged into the x86/entry branch of tip:
>
> Commit-ID: bb998361999e79bc87dae1ebe0f5bf317f632585
> Gitweb: https://git.kernel.org/tip/bb998361999e79bc87dae1ebe0f5bf317f632585
> Author: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> AuthorDate: Mon, 08 Jan 2024 11:39:50
> Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitterDate: Wed, 24 Jan 2024 13:57:59 +01:00
>
> x86/entry: Avoid redundant CR3 write on paranoid returns
>
> The CR3 restore happens in:
>
> 1. #NMI return.
> 2. paranoid_exit() (i.e. #MCE, #VC, #DB and #DF return)
>
> Contrary to the implication in commit 21e94459110252 ("x86/mm: Optimize
> RESTORE_CR3"), the kernel never modifies CR3 in any of these exceptions,
> except for switching from user to kernel pagetables under PTI. That
> means that most of the time when returning from an exception that
> interrupted the kernel no CR3 restore is necessary. Writing CR3 is
> expensive on some machines.
>
> Most of the time because the interrupt might have come during kernel entry
> before the user to kernel CR3 switch or the during exit after the kernel to
> user switch. In the former case skipping the restore would be correct, but
> definitely not for the latter.
>
> So check the saved CR3 value and restore it only, if it is a user CR3.
>
> Give the macro a new name to clarify its usage, and remove a comment that
> was describing the original behaviour along with the not longer needed jump
> label.
>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/20240108113950.360438-1-jackmanb@xxxxxxxxxx
>
> [Rewrote commit message; responded to review comments]
> Change-Id: I6e56978c4753fb943a7897ff101f519514fa0827
> ---
> arch/x86/entry/calling.h | 26 ++++++++++----------------
> arch/x86/entry/entry_64.S | 7 +++----
> 2 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 9f1d947..92dca4a 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -239,17 +239,19 @@ For 32-bit we have the following conventions - kernel is built with
> .Ldone_\@:
> .endm
>
> -.macro RESTORE_CR3 scratch_reg:req save_reg:req
> +/* Restore CR3 from a kernel context. May restore a user CR3 value. */
> +.macro PARANOID_RESTORE_CR3 scratch_reg:req save_reg:req
> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>
> - ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
> -
> /*
> - * KERNEL pages can always resume with NOFLUSH as we do
> - * explicit flushes.
> + * If CR3 contained the kernel page tables at the paranoid exception
> + * entry, then there is nothing to restore as CR3 is not modified while
> + * handling the exception.
> */
> bt $PTI_USER_PGTABLE_BIT, \save_reg
> - jnc .Lnoflush_\@
> + jnc .Lend_\@
> +
> + ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
>
> /*
> * Check if there's a pending flush for the user ASID we're
> @@ -257,20 +259,12 @@ For 32-bit we have the following conventions - kernel is built with
> */
> movq \save_reg, \scratch_reg
> andq $(0x7FF), \scratch_reg
> - bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
> - jnc .Lnoflush_\@
> -
> btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
> - jmp .Lwrcr3_\@
> + jc .Lwrcr3_\@
>
> -.Lnoflush_\@:
> SET_NOFLUSH_BIT \save_reg
>
> .Lwrcr3_\@:
> - /*
> - * The CR3 write could be avoided when not changing its value,
> - * but would require a CR3 read *and* a scratch register.
> - */
> movq \save_reg, %cr3
> .Lend_\@:
> .endm
> @@ -285,7 +279,7 @@ For 32-bit we have the following conventions - kernel is built with
> .endm
> .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
> .endm
> -.macro RESTORE_CR3 scratch_reg:req save_reg:req
> +.macro PARANOID_RESTORE_CR3 scratch_reg:req save_reg:req
> .endm
>
> #endif
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c40f89a..aedd169 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -968,14 +968,14 @@ SYM_CODE_START_LOCAL(paranoid_exit)
> IBRS_EXIT save_reg=%r15
>
> /*
> - * The order of operations is important. RESTORE_CR3 requires
> + * The order of operations is important. PARANOID_RESTORE_CR3 requires
> * kernel GSBASE.
> *
> * NB to anyone to try to optimize this code: this code does
> * not execute at all for exceptions from user mode. Those
> * exceptions go through error_return instead.
> */
> - RESTORE_CR3 scratch_reg=%rax save_reg=%r14
> + PARANOID_RESTORE_CR3 scratch_reg=%rax save_reg=%r14
>
> /* Handle the three GSBASE cases */
> ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
> @@ -1404,8 +1404,7 @@ end_repeat_nmi:
> /* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
> IBRS_EXIT save_reg=%r15
>
> - /* Always restore stashed CR3 value (see paranoid_entry) */
> - RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
> + PARANOID_RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
>
> /*
> * The above invocation of paranoid_entry stored the GSBASE