Re: [PATCH] stop_machine: avoid potential race behaviour

From: Marco Elver
Date: Mon Oct 07 2019 - 07:01:49 EST


On Mon, 7 Oct 2019 at 12:45, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Both multi_cpu_stop() and set_state() access multi_stop_data::state
> racily using plain accesses. These are subject to compiler
> transformations which could break the intended behaviour of the code,
> and this situation is detected by KCSAN on both arm64 and x86 (splats
> below).
>
> Let's improve matters by using READ_ONCE() and WRITE_ONCE() to ensure
> that the compiler cannot elide, replay, or tear loads and stores. In
> multi_cpu_stop() we expect the two loads of multi_stop_data::state to be
> a consistent value, so we snapshot the value into a temporary variable
> to ensure this.
>
> The state transitions are serialized by atomic manipulation of
> multi_stop_data::num_threads, and other fields in multi_stop_data are
> not modified while subject to concurrent reads.
>
> KCSAN splat on arm64:
>
> | BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
> |
> | write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
> | set_state+0x80/0xb0
> | multi_cpu_stop+0x16c/0x198
> | cpu_stopper_thread+0x170/0x298
> | smpboot_thread_fn+0x40c/0x560
> | kthread+0x1a8/0x1b0
> | ret_from_fork+0x10/0x18
> |
> | read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
> | multi_cpu_stop+0xa8/0x198
> | cpu_stopper_thread+0x170/0x298
> | smpboot_thread_fn+0x40c/0x560
> | kthread+0x1a8/0x1b0
> | ret_from_fork+0x10/0x18
> |
> | Reported by Kernel Concurrency Sanitizer on:
> | CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
> | Hardware name: linux,dummy-virt (DT)
>
> KCSAN splat on x86:
>
> | write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
> | set_state kernel/stop_machine.c:170 [inline]
> | ack_state kernel/stop_machine.c:177 [inline]
> | multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
> | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
> | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
> | kthread+0x1b5/0x200 kernel/kthread.c:255
> | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
> |
> | read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
> | multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
> | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
> | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
> | kthread+0x1b5/0x200 kernel/kthread.c:255
> | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
> |
> | Reported by Kernel Concurrency Sanitizer on:
> | CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Marco Elver <elver@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

Thanks for fixing this!

Acked-by: Marco Elver <elver@xxxxxxxxxx>

> ---
> kernel/stop_machine.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index c7031a22aa7b..998d50ee2d9b 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -7,6 +7,7 @@
> * Copyright (C) 2010 SUSE Linux Products GmbH
> * Copyright (C) 2010 Tejun Heo <tj@xxxxxxxxxx>
> */
> +#include <linux/compiler.h>
> #include <linux/completion.h>
> #include <linux/cpu.h>
> #include <linux/init.h>
> @@ -167,7 +168,7 @@ static void set_state(struct multi_stop_data *msdata,
> /* Reset ack counter. */
> atomic_set(&msdata->thread_ack, msdata->num_threads);
> smp_wmb();
> - msdata->state = newstate;
> + WRITE_ONCE(msdata->state, newstate);
> }
>
> /* Last one to ack a state moves to the next state. */
> @@ -186,7 +187,7 @@ void __weak stop_machine_yield(const struct cpumask *cpumask)
> static int multi_cpu_stop(void *data)
> {
> struct multi_stop_data *msdata = data;
> - enum multi_stop_state curstate = MULTI_STOP_NONE;
> + enum multi_stop_state newstate, curstate = MULTI_STOP_NONE;
> int cpu = smp_processor_id(), err = 0;
> const struct cpumask *cpumask;
> unsigned long flags;
> @@ -210,8 +211,9 @@ static int multi_cpu_stop(void *data)
> do {
> /* Chill out and ensure we re-read multi_stop_state. */
> stop_machine_yield(cpumask);
> - if (msdata->state != curstate) {
> - curstate = msdata->state;
> + newstate = READ_ONCE(msdata->state);
> + if (newstate != curstate) {
> + curstate = newstate;
> switch (curstate) {
> case MULTI_STOP_DISABLE_IRQ:
> local_irq_disable();
> --
> 2.11.0
>