Re: [PATCH v2] stop_machine: Avoid potential non-atomic read of multi_stop_data::state

From: Mark Rutland
Date: Fri Oct 20 2023 - 06:23:50 EST


On Thu, Oct 19, 2023 at 08:11:23AM +0800, Rong Tao wrote:
> From: Rong Tao <rongtao@xxxxxxxx>
>
> In commit b1fc58333575 ("stop_machine: Avoid potential race behaviour")
> fix both multi_cpu_stop() and set_state() access multi_stop_data::state,
> Pass curstate as a parameter to ack_state(), to avoid the non-atomic read.

Can we please describe this better? This is *not* a fix, it is a cleanup.

As I covered in:

https://lore.kernel.org/lkml/ZS5g6I-UtUnihToH@FVFF77S0Q05N/

... there are no concurrent writers, and so the value of multi_stop_data::state
cannot change, and a non-atomic read is fine.

The actual change looks good to me as it makes it easier to see that there's no
race.

> And replace smp_wmb()+WRITE_ONCE() with smp_store_release().

This is also fine, but feels like a logically separate change.

Mark.

>
> Signed-off-by: Rong Tao <rongtao@xxxxxxxx>
> ---
> v1: stop_machine: Avoid potential race behaviour of multi_stop_data::state
> https://lore.kernel.org/lkml/tencent_705C16DF25978ACAEBD1E83E228881901006@xxxxxx/
> ---
> kernel/stop_machine.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index cedb17ba158a..35a122ce2cbd 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -183,15 +183,15 @@ static void set_state(struct multi_stop_data *msdata,
> {
> /* Reset ack counter. */
> atomic_set(&msdata->thread_ack, msdata->num_threads);
> - smp_wmb();
> - WRITE_ONCE(msdata->state, newstate);
> + smp_store_release(&msdata->state, newstate);
> }
>
> /* Last one to ack a state moves to the next state. */
> -static void ack_state(struct multi_stop_data *msdata)
> +static void ack_state(struct multi_stop_data *msdata,
> + enum multi_stop_state curstate)
> {
> if (atomic_dec_and_test(&msdata->thread_ack))
> - set_state(msdata, msdata->state + 1);
> + set_state(msdata, curstate + 1);
> }
>
> notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> @@ -242,7 +242,7 @@ static int multi_cpu_stop(void *data)
> default:
> break;
> }
> - ack_state(msdata);
> + ack_state(msdata, curstate);
> } else if (curstate > MULTI_STOP_PREPARE) {
> /*
> * At this stage all other CPUs we depend on must spin
> --
> 2.42.0
>