Re: [PATCH] smp_call_function_many: handle concurrent clearing ofmask

From: Mike Galbraith
Date: Mon Jan 31 2011 - 22:16:01 EST


On Mon, 2011-01-31 at 14:26 -0600, Milton Miller wrote:
> On Mon, 31 Jan 2011 about 08:21:22 +0100, Mike Galbraith wrote:
> > Wondering if a final sanity check makes sense. I've got a perma-spin
> > bug where comment apparently happened. Another CPU's diddle the mask
> > IPI may make this CPU do horrible things to itself as it's setting up to
> > IPI others with that mask.
> >
> > ---
> > kernel/smp.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > Index: linux-2.6.38.git/kernel/smp.c
> > ===================================================================
> > --- linux-2.6.38.git.orig/kernel/smp.c
> > +++ linux-2.6.38.git/kernel/smp.c
> > @@ -490,6 +490,9 @@ void smp_call_function_many(const struct
> > cpumask_and(data->cpumask, mask, cpu_online_mask);
> > cpumask_clear_cpu(this_cpu, data->cpumask);
> >
> > + /* Did you pass me a mask that can be changed/emptied under me? */
> > + BUG_ON(cpumask_empty(data->cpumask));
> > +
>
> I was thinking of this as "the ipi cpumask was cleared", but I realize now
> you are saying the caller passed in a cpumask, but between the cpu_first/
> cpu_next calls above and the cpumask_and another cpu cleared all the cpus?
>
> I could see how that could happen on say a mask of cpus that might have a
> translation context, or cpus that need a push to complete an rcu window.
> Instead of the BUG_ON, we can handle the mask being cleared.
>
> The arch code to send the IPI must handle an empty mask, as the other
> cpus are racing to clear their bit while its trying to send the IPI.
> In fact that expected race is the cause of the x86 warning in bz 23042
> https://bugzilla.kernel.org/show_bug.cgi?id=23042 that Andrew pointed
> out.
>
>
> How about this [untested] patch?
>
> Mike Galbraith reported finding a lockup where aparently the passed in
> cpumask was cleared on other cpu(s) while this cpu was preparing its
> smp_call_function_many block. Detect this race and unlock the call
> data block. Note: arch_send_call_function_ipi_mask must still handle an
> empty mask because the element is globally visable before it is called.
> And obviously there are no guarantees to which cpus are notified if the
> mask is changed during the call.

Yes, that would work. In my case, it was passed mm_cpumask(mm). What
is unclear is whether mask at call time was what the programmer needed
action on, ie mask changing may be intolerable information loss/gain.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/