Re: [PATCH] synchronize_irq needs a barrier

From: Herbert Xu
Date: Fri Oct 19 2007 - 00:20:48 EST


On Thu, Oct 18, 2007 at 08:26:45PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 18 Oct 2007, Linus Torvalds wrote:
> >
> > I *think* it should work with something like
> >
> > for (;;) {
> > smp_rmb();
> > if (!spin_is_locked(&desc->lock)) {
> > smp_rmb();
> > if (!(desc->status & IRQ_INPROGRESS)
> > break;
> > }
> > cpu_relax();
> > }
>
> I'm starting to doubt this.
>
> One of the issues is that we still need the smp_mb() in front of the loop
> (because we want to serialize the loop with any writes in the caller).

Right. I think if we accept the definition of a spin lock/unlock
that Nick outlined earlier, then we can see that on the IRQ path
there simply isn't a memory barrier between the changing of the
status field and the execution of the action:

spin_lock
set IRQ_INPROGRESS
spin_unlock

action

spin_lock
clear IRQ_INPROGRESS
spin_unlock

What may happen is that action can either float upwards to give

spin_lock
action
set IRQ_INPROGRESS
spin_unlock

spin_lock
clear IRQ_INPROGRESS
spin_unlock

or it can float downwards to give

spin_lock
set IRQ_INPROGRESS
spin_unlock

spin_lock
clear IRQ_INPROGRESS
action
spin_unlock

As a result we can add as many barriers as we want on the slow
(synchronize_irq) path and it just won't do any good (not unless
we add some barriers on the IRQ path == fast path).

What we do have on the right though is the fact in some ways
action behaves as if it's inside the spin lock even though it's
not. In particular, action cannot float up past the first spin
lock nor can it float down past the last spin unlock.

> See commit fa490cfd15d7ce0900097cc4e60cfd7a76381138 and ponder. Maybe we
> should take the same approach here, and do something like
>
> repeat:
> /* Optimistic, no-locking loop */
> while (desc->status & IRQ_INPROGRESS)
> cpu_relax();
>
> /* Ok, that indicated we're done: double-check carefully */
> spin_lock_irqsave(&desc->lock, flags);
> status = desc->status;
> spin_unlock_irqrestore(&desc->lock, flags);
>
> /* Oops, that failed? */
> if (status & IRQ_INPROGRESS)
> goto repeat;

That's why I think this patch is in fact the only one that
solves all the races in this thread. The case that it solves
which the lock/unlock patch does not is the one where action
flows downwards past the clearing of IRQ_INPROGRESS. I missed
this case earlier.

In fact this bug exists elsewhere too. For example, the network
stack does this in net/sched/sch_generic.c:

/* Wait for outstanding qdisc_run calls. */
while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
yield();

This has the same problem as the current synchronize_irq code.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/