[PATCH 1/3] x86/membarrier: Get rid of a dubious optimization

From: Andy Lutomirski
Date: Mon Nov 30 2020 - 12:51:28 EST


sync_core_before_usermode() had an incorrect optimization. If we're
in an IRQ, we can get to usermode without IRET -- we just have to
schedule to a different task in the same mm and do SYSRET.
Fortunately, there were no callers of sync_core_before_usermode()
that could have had in_irq() or in_nmi() equal to true, because it's
only ever called from the scheduler.

While we're at it, clarify a related comment.

Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
---
arch/x86/include/asm/sync_core.h | 9 +++++----
arch/x86/mm/tlb.c | 6 ++++--
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 0fd4a9dfb29c..ab7382f92aff 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -98,12 +98,13 @@ static inline void sync_core_before_usermode(void)
/* With PTI, we unconditionally serialize before running user code. */
if (static_cpu_has(X86_FEATURE_PTI))
return;
+
/*
- * Return from interrupt and NMI is done through iret, which is core
- * serializing.
+ * Even if we're in an interrupt, we might reschedule before returning,
+ * in which case we could switch to a different thread in the same mm
+ * and return using SYSRET or SYSEXIT. Instead of trying to keep
+ * track of our need to sync the core, just sync right away.
*/
- if (in_irq() || in_nmi())
- return;
sync_core();
}

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 11666ba19b62..dabe683ab076 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -474,8 +474,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
/*
* The membarrier system call requires a full memory barrier and
* core serialization before returning to user-space, after
- * storing to rq->curr. Writing to CR3 provides that full
- * memory barrier and core serializing instruction.
+ * storing to rq->curr, when changing mm. This is because membarrier()
+ * sends IPIs to all CPUs that are in the target mm, but another
+ * CPU switch to the target mm in the mean time. Writing to CR3
+ * provides that full memory barrier and core serializing instruction.
*/
if (real_prev == next) {
VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
--
2.28.0