Re: [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust

From: Ashok Raj
Date: Wed Jun 14 2023 - 16:47:22 EST


On Wed, Jun 14, 2023 at 09:53:21PM +0200, Thomas Gleixner wrote:
> > If we go down the INIT path, life is less complicated..
> >
> > After REBOOT_VECTOR IPI, if stop_cpus_count > 0, we send NMI to all CPUs.
> > Won't this completely update the atomic_dec() since CPUs in hlt() will also
> > take the NMI correct? I'm not sure if this is problematic.
> >
> > Or should we reinitialize stop_cpus_count before the NMI hurrah
>
> Bah. Didn't think about HLT. Let me go back to the drawing board. Good catch!
>
> >> + /*
> >> + * Ensure that the cache line is invalidated on the other CPUs. See
> >> + * comment vs. SME in stop_this_cpu().
> >> + */
> >> + atomic_set(&stop_cpus_count, INT_MAX);
> >
> > Didn't understand why INT_MAX here?
>
> Any random number will do. The only purpose is to ensure that there is
> no dirty cache line on the other (stopped) CPUs.
>
> Now let me look into this NMI cruft.
>

Maybe if each CPU going down can set their mask, we can simply hit NMI to
only the problematic ones?

The simple count doesn't capture the CPUs in trouble.