Re: [patch] SMP race fix [was Re: SMP lockup & 3c509 on 2.2.x [aka.

Andrea Arcangeli (andrea@e-mind.com)
Wed, 5 May 1999 14:18:48 +0200 (CEST)


On Wed, 5 May 1999, Andrea Arcangeli wrote:

>I think I seen the race that maybe was the source of the popular SMP
>network lockup with code like: disable_irq(); spin_lock(); ...
>spin_unlock(); enable_irq().
>
>As first here it is my untested (I can't reproduce here) fix that will
>apply cleanly to 2.2.7:
>
>Index: arch/i386/kernel/irq.c
>===================================================================
>RCS file: /var/cvs/linux/arch/i386/kernel/irq.c,v
>retrieving revision 1.1.2.24
>diff -u -r1.1.2.24 irq.c
>--- linux/arch/i386/kernel/irq.c 1999/05/04 15:56:32 1.1.2.24
>+++ linux/arch/i386/kernel/irq.c 1999/05/05 09:43:20
>@@ -767,7 +767,7 @@
> }
> spin_unlock_irqrestore(&irq_controller_lock, flags);
>
>- if (irq_desc[irq].status & IRQ_INPROGRESS)
>+ while (irq_desc[irq].status & IRQ_INPROGRESS)
> synchronize_irq();
> }

Ah! The code above is showing some bad design of the IRQ_INPROGRESS stuff.

Basically the problem is that we are enabling INPROGRESS even if we just
seen the IRQ_DISABLED big set. So we may had an irq spinning on the
controller lock while we set IRQ_DISABLED from disable_irq(), and such irq
then will set IRQINPROGRESS while IRQ_DISABLED was set, but then it will
return without clear IRQ_INPROGRESS and so the in-progress bit will never
be cleared anymore (because nobody will ever go ahead in the irq code).

So this my other patch is needed to fix the design of the irq-inprogress
bit:

Index: arch/i386/kernel//io_apic.c
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.2.8
diff -u -r1.1.2.8 io_apic.c
--- linux/arch/i386/kernel/io_apic.c 1999/04/17 14:58:41 1.1.2.8
+++ io_apic.c 1999/05/05 11:47:30
@@ -1060,8 +1060,9 @@
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
action = desc->action;
status &= ~IRQ_PENDING;
+ status |= IRQ_INPROGRESS;
}
- desc->status = status | IRQ_INPROGRESS;
+ desc->status = status;
spin_unlock(&irq_controller_lock);

/*
@@ -1112,8 +1113,9 @@
action = NULL;
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
action = desc->action;
+ status |= IRQ_INPROGRESS;
}
- desc->status = status | IRQ_INPROGRESS;
+ desc->status = status;

ack_APIC_irq();
spin_unlock(&irq_controller_lock);
Index: arch/i386/kernel//irq.c
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/irq.c,v
retrieving revision 1.1.2.25
diff -u -r1.1.2.25 irq.c
--- linux/arch/i386/kernel/irq.c 1999/05/05 10:26:44 1.1.2.25
+++ irq.c 1999/05/05 12:01:52
@@ -242,8 +242,11 @@
status = desc->status & ~IRQ_REPLAY;
action = NULL;
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))
+ {
action = desc->action;
- desc->status = status | IRQ_INPROGRESS;
+ status |= IRQ_INPROGRESS;
+ }
+ desc->status = status;
}
spin_unlock(&irq_controller_lock);

@@ -778,7 +781,7 @@
spin_lock_irqsave(&irq_controller_lock, flags);
switch (irq_desc[irq].depth) {
case 1:
- irq_desc[irq].status &= ~(IRQ_DISABLED | IRQ_INPROGRESS);
+ irq_desc[irq].status &= ~IRQ_DISABLED;
irq_desc[irq].handler->enable(irq);
/* fall throught */
default:
@@ -960,7 +963,7 @@
for (i = NR_IRQS-1; i > 0; i--) {
if (!irq_desc[i].action) {
unsigned int status = irq_desc[i].status | IRQ_AUTODETECT;
- irq_desc[i].status = status & ~IRQ_INPROGRESS;
+ irq_desc[i].status = status & ~(IRQ_INPROGRESS|IRQ_DISABLED);
irq_desc[i].handler->startup(i);
}
}
@@ -985,6 +988,7 @@
/* It triggered already - consider it spurious. */
if (status & IRQ_INPROGRESS) {
irq_desc[i].status = status & ~IRQ_AUTODETECT;
+ irq_desc[i].status |= IRQ_DISABLED;
irq_desc[i].handler->shutdown(i);
}
}
@@ -1015,6 +1019,7 @@
nr_irqs++;
}
irq_desc[i].status = status & ~IRQ_AUTODETECT;
+ irq_desc[i].status |= IRQ_DISABLED;
irq_desc[i].handler->shutdown(i);
}
spin_unlock_irq(&irq_controller_lock);
Index: arch/i386/kernel//visws_apic.c
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/visws_apic.c,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 visws_apic.c
--- linux/arch/i386/kernel/visws_apic.c 1999/01/24 02:46:09 1.1.2.1
+++ visws_apic.c 1999/05/05 11:47:30
@@ -204,8 +204,11 @@
status = desc->status & ~IRQ_REPLAY;
action = NULL;
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))
+ {
action = desc->action;
- desc->status = status | IRQ_INPROGRESS;
+ status |= IRQ_INPROGRESS;
+ }
+ desc->status = status;
}
spin_unlock(&irq_controller_lock);


This my second incremental patch make sure to have a relialable
IRQ_INPROGRESS value, to allow us to safely loop while(in_progress) at the
end of disable_irq().

Really with this new patch I removed the need to clear the IRQINPROGRESS
bit upon irq_enable() and so I think this patch alone would just fix the
SMP race (the problem is that we was doing a enable_irq while the irq was
running and so we was allowing a second irq to reenter nested in the irq
handler).

The important thing to not lockup in the network card case is not to have
the irq that reenter in itself (as it was happening) or to not have the
irq running over the bh code. We don't care if the irq is running on the
other CPU. So basically I think that with this second patch above we
woudn't need to synchronize with irqs at all. So I think we could add a
new call to disable_irq_nosync(irq) that simply forget to
`while(irq_inprogress) sync_irq', but it will only care that the irq won't
happens in the same CPU after disable_irq() is finished. This will get us
performances. And as just said the nework code will be allowed to use
disable_irq_nosync because it just uses a spinlock to SMP synchronize.

The plain disable_irq() instead still need a while(irq_inprogress) loop I
proposed in my first patch according to me.

If you apply both the two patches you should go stable though...

Now I have to shutdown the computer (I had to go to University ...), I'll
continue to think about the problem in the late evening.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/