[BUG] racy access to p->mm in membarrier_global_expedited()

From: Jann Horn
Date: Fri Jan 25 2019 - 12:27:22 EST


membarrier_global_expedited() runs the following code (introduced in
commit c5f58bd58f43), protected only by an RCU read-side critical
section and the cpu_hotplug_lock:

p = task_rcu_dereference(&cpu_rq(cpu)->curr);
if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
if (!fallback)
__cpumask_set_cpu(cpu, tmpmask);
else
smp_call_function_single(cpu, ipi_mb, NULL, 1);
}

p->mm is not protected by either lock. This means that in theory, the
following races could occur:

1. If the compiler emitted two separate reads of ->mm, the second read
of p->mm could return a NULL pointer and crash.
2. If the mm is deallocated directly before the atomic_read() occurs,
the atomic_read() could access a freed pointer (I think?).

Neither of these are particularly likely - looking at the assembly of
a normal build, the first race doesn't exist because the compiler
optimizes the second read away, and the second race isn't going to
cause anything particularly interesting. Still, this should probably
be fixed...

As far as I can tell, you'll have to either take the task_lock()
around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
RCU to the lifetime of mm_struct. I'm not entirely sure what the
better fix is... probably task_lock() makes more sense?


To test the bug, I patched an extra delay into the code:

====================
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 3cd8a3a795d2..69cc52039576 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -14,6 +14,7 @@
* GNU General Public License for more details.
*/
#include "sched.h"
+#include <linux/delay.h>

/*
* Bitmask made from a "or" of all commands within enum membarrier_cmd,
@@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)

rcu_read_lock();
p = task_rcu_dereference(&cpu_rq(cpu)->curr);
- if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
+ if (p && p->mm && (mdelay(100), 1) &&
(atomic_read(&p->mm->membarrier_state) &
MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
if (!fallback)
__cpumask_set_cpu(cpu, tmpmask);
====================

On a kernel with that patch applied, I ran this test code:

====================
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <linux/membarrier.h>
#include <err.h>

int main(void) {
while (1) {
printf("executing global expedited barrier...\n");
int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
if (res) err(1, "barrier");
}
}
====================

That resulted in this splat:

[ 212.697681] ==================================================================
[ 212.700582] BUG: KASAN: null-ptr-deref in
membarrier_global_expedited+0x15f/0x220
[ 212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177

[ 212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
[ 212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[ 212.712263] Call Trace:
[ 212.713177] dump_stack+0x71/0xab
[ 212.714375] ? membarrier_global_expedited+0x15f/0x220
[ 212.716236] ? membarrier_global_expedited+0x15f/0x220
[ 212.718099] kasan_report+0x176/0x192
[ 212.719445] ? finish_task_switch+0x340/0x3d0
[ 212.721057] ? membarrier_global_expedited+0x15f/0x220
[ 212.722921] membarrier_global_expedited+0x15f/0x220
[ 212.724696] ? ipi_mb+0x10/0x10
[ 212.725816] ? vfs_write+0x120/0x230
[ 212.727113] ? __ia32_sys_read+0x50/0x50
[ 212.728596] __x64_sys_membarrier+0x85/0xf0
[ 212.730056] do_syscall_64+0x73/0x160
[ 212.731428] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 212.733236] RIP: 0033:0x7fbe8747e229
[ 212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
01 48
[ 212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000144
[ 212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
[ 212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
[ 212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
[ 212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
[ 212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
[ 212.756305] ==================================================================