Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)

From: Paul E. McKenney
Date: Mon Apr 20 2015 - 08:17:40 EST


On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > >
> > > That's all fine and good, but why is an IPI sent to a non-existent
> > > CPU? It's not like we don't know which CPU is up and down.
> >
> > I agree that the qemu-arm behavior smells like a bug, plain and
> > simple. Nobody sane should send random IPI's to CPU's that they
> > don't even know are up or not.
>
> Yeah, and a warning would have caught that bug a bit earlier, at the
> cost of restricting the API:
>
> > That said, I could imagine that we would have some valid case where
> > we just do a cross-cpu call to (for example) do lock wakeup, and the
> > code would use some optimistic algorithm that prods the CPU after
> > the lock has been released, and there could be some random race
> > where the lock data structure has already been released (ie imagine
> > the kind of optimistic unlocked racy access to "sem->owner" that we
> > discussed as part of the rwsem_spin_on_owner() thread recently).
> >
> > So I _could_ imagine that somebody would want to do optimistic "prod
> > other cpu" calls that in all normal cases are for existing cpus, but
> > could be racy in theory.
>
> Yes, and I don't disagree with such optimizations in principle (it
> allows less references to be taken in the fast path), but is it really
> safe?
>
> If a CPU is going down and we potentially race against that, and send
> off an IPI, the IPI might be 'in flight' for an indeterminate amount
> of time, especially on wildly non-deterministic hardware like virtual
> platforms.
>
> So if the CPU goes down during that time, the typical way a CPU goes
> down is that it ignores all IPIs that arrive after that. End result:
> the sender of the IPI may hang indefinitely (for the synchronous API),
> waiting for the CSD lock to be released...
>
> For the non-wait API we could also hang, but in an even more
> interesting way: we won't hang trying to send to the downed CPU, we'd
> hang on the _next_ cross-call call, possibly sending to another CPU,
> because the CSD_FLAG_LOCK of the sender CPU is never released by the
> offlined CPU which was supposed to unlock it.
>
> Also note the existing warning we already have in
> flush_smp_call_function_queue():
>
> /* There shouldn't be any pending callbacks on an offline CPU. */
> if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
> !warned && !llist_empty(head))) {
> warned = true;
> WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
>
> Couldn't this trigger in the opportunistic, imprecise IPI case, if
> IRQs are ever enabled when or after the CPU is marked offline?
>
> My suggested warning would simply catch this kind of unsafe looking
> race a bit sooner, instead of silently ignoring the cross-call
> attempt.
>
> Now your suggestion could be made to work by:
>
> - polling for CPU status in the CSD-lock code as well. (We don't do
> this currently.)
>
> - making sure that if a late IPI arrives to an already-down CPU it
> does not attempt to execute CSD functions. (I.e. silence the
> above, already existing warning.)
>
> - auditing the CPU-down path to make sure it does not get surprised
> by late external IPIs. (I think this should already be so, given
> the existing warning.)
>
> - IPIs can be pending indefinitely, so make sure a pending IPI won't
> confuse the machinery after a CPU has been onlined again.
>
> - pending IPIs may consume hardware resources when not received
> properly. For example I think x86 will keep trying to resend it.
> Not sure there's any timeout mechanism on the hardware level - the
> sending APIC might even get stuck? Audit all hardware for this
> kind of possibility.
>
> So unless I'm missing something, to me it looks like that the current
> code is only safe to be used on offline CPUs if the 'offline' CPU is
> never ever brought online in the first place.
>
> > It doesn't sound like the qemu-arm case is that kind of situation,
> > though. That one just sounds like a stupid "let's send an ipi to a
> > cpu whether it exists or not".
> >
> > But Icommitted it without any new warning, because I could in theory
> > see it as being a valid use.
>
> So my point is that we already have a 'late' warning, and I think it
> actively hid the qemu-arm bug, because the 'offline' CPU was so
> offline (due to being non-existent) that it never had a chance to
> print a warning.
>
> With an 'early' warning we could flush out such bugs (or code
> uncleanlinesses: clearly the ARM code was at least functional before)
> a bit sooner.
>
> But I don't have strong feelings about it, SMP cross-call users are a
> lot less frequent than say locking facility users, so the strength of
> debugging isn't nearly as important and it's fine to me either way!

In the long term, support of "send an IPI to a CPU that might or might
not be leaving" will probably need an API that returns a success or
failure indication. This will probably simplify things a bit, because
currently the caller needs to participate in the IPI's synchronization.
With a conditional primitive, callers could instead simply rely on the
return value.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/