[RFC PATCH 3/3] Fix: sched/membarrier: use probe_kernel_address to read mm->membarrier_state

From: Mathieu Desnoyers
Date: Tue Sep 03 2019 - 12:00:55 EST


membarrier_global_expedited reads each runqueue current task's
mm->membarrier_state to test a flag. A concurrent task exit can cause
the memory backing the struct mm to have been freed before that read.
Therefore, use probe_kernel_address to read that flag. If the read
fails, consider it as if the flag was unset: it means the scheduler code
provides the barriers required by membarrier, and sending an IPI to this
CPU is redundant.

There is ongoing discussion on removing the need to use
probe_kernel_address from this code by providing additional existence
guarantees for struct mm. This patch is submitted as a fix aiming to be
backported to prior stable kernel releases.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx>
Cc: Chris Metcalf <cmetcalf@xxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Kirill Tkhai <tkhai@xxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 02feb7c8da4f..0312604d8315 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -58,6 +58,8 @@ static int membarrier_global_expedited(void)
cpus_read_lock();
for_each_online_cpu(cpu) {
struct task_struct *p;
+ struct mm_struct *mm;
+ int membarrier_state;

/*
* Skipping the current CPU is OK even through we can be
@@ -72,17 +74,34 @@ static int membarrier_global_expedited(void)

rcu_read_lock();
p = task_rcu_dereference(&cpu_rq(cpu)->curr);
- if (p) {
- struct mm_struct *mm = READ_ONCE(p->mm);
+ if (!p)
+ goto next;
+ mm = READ_ONCE(p->mm);
+ if (!mm)
+ goto next;

- if (mm && (atomic_read(&mm->membarrier_state) &
- MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
- if (!fallback)
- __cpumask_set_cpu(cpu, tmpmask);
- else
- smp_call_function_single(cpu, ipi_mb, NULL, 1);
- }
+ /*
+ * Using probe_kernel_address() of membarrier_state instead of
+ * an atomic_read() to deal with the fact that mm may have been
+ * concurrently freed. If probe_kernel_address fails, it means
+ * the mm struct was freed, so there is no need to issue a
+ * barrier on that particular CPU, because the scheduler code
+ * is taking care of it.
+ *
+ * It does not matter whether probe_kernel_address decides to
+ * read membarrier_state piece-wise, given that we only care
+ * about testing a single bit.
+ */
+ if (probe_kernel_address(&mm->membarrier_state.counter,
+ membarrier_state))
+ membarrier_state = 0;
+ if (membarrier_state & MEMBARRIER_STATE_GLOBAL_EXPEDITED) {
+ if (!fallback)
+ __cpumask_set_cpu(cpu, tmpmask);
+ else
+ smp_call_function_single(cpu, ipi_mb, NULL, 1);
}
+ next:
rcu_read_unlock();
}
if (!fallback) {
--
2.17.1