Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs

From: Peter Zijlstra
Date: Mon May 27 2019 - 06:56:58 EST


On Wed, May 22, 2019 at 02:23:47PM -0400, Rik van Riel wrote:
> On Wed, 2019-05-22 at 16:49 +0200, Peter Zijlstra wrote:
> > On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > > > Is perhaps the problem that on_each_cpu_cond() uses
> > > > cpu_onlne_mask
> > > > without protection?
> > >
> > > Does this prevent racing with a CPU going offline? I guess this
> > > prevents
> > > the warning at the expense of a lock - but is only beneficial in
> > > the
> > > unlikely path. (In the likely path this prevents new CPUs going
> > > offline
> > > but we don't care because we don't WARN if they aren't they when we
> > > attempt to call functions).
> > >
> > > At least this is my limited understanding.
> >
> > Hmm.. I don't think it could matter, we only use the mask when
> > preempt_disable(), which would already block offline, due to it using
> > stop_machine().
> >
> > So the patch is a no-op.
> >
> > What's the WARN you see? TLB invalidation should pass mm_cpumask(),
> > which similarly should not contain offline CPUs I'm thinking.
>
> Does the TLB invalidation code have anything in it
> to prevent from racing with the CPU offline code?
>
> In other words, could we end up with the TLB
> invalidation code building its bitmask, getting
> interrupted (eg. hypervisor preemption, NMI),
> and not sending out the IPI to that bitmask of
> CPUs until after one of the CPUs in the bitmap
> has gotten offlined?

One possible thing would be if cpu-offline didn't remove the bit from
mm_cpumask() because it entered lazy state earlier or something.

Then, mm_cpumask() would contain an offline CPU and the WARN could
trigger, but I don't _think_ we do that, but I didn't check.