[patch] killed tqueue_lock spinlock, Re: [patch] race-fix for bottom-half-functions [Re: Subtle trou

Andrea Arcangeli (andrea@e-mind.com)
Sun, 31 Jan 1999 02:31:01 +0100 (CET)


But doing that we can't run run_task_queue() inside an irq hander anymore,
because we can risk to get interrupted before having the time to clear the bit1.
This is _the_ bug. Most of bh are run from the timer irq handler. The only
way to avoid deadlocking is to __cli() the local cpu before/after
setting/unsetting the bit1 of ->sync, but then you get again a __cli... that
was the thing you tried to avoid developing this patch I guess...

Once fixed the bugs it's still a nice performance optimization though.

I applyed and fixed it here. Here the diff against my current tree:

Index: include/linux/tqueue.h
===================================================================
RCS file: /var/cvs/linux/include/linux/tqueue.h,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 tqueue.h
--- tqueue.h 1999/01/18 01:33:03 1.1.2.1
+++ linux/include/linux/tqueue.h 1999/01/31 02:19:47
@@ -25,17 +25,9 @@
* - Bottom halfs are implemented as a linked list. You can have as many
* of them, as you want.
* - No more scanning of a bit field is required upon call of a bottom half.
- * - Support for chained bottom half lists. The run_task_queue() function can be
- * used as a bottom half handler. This is for example useful for bottom
- * halfs, which want to be delayed until the next clock tick.
- *
- * Problems:
- * - The queue_task_irq() inline function is only atomic with respect to itself.
- * Problems can occur, when queue_task_irq() is called from a normal system
- * call, and an interrupt comes in. No problems occur, when queue_task_irq()
- * is called from an interrupt or bottom half, and interrupted, as run_task_queue()
- * will not be executed/continued before the last interrupt returns. If in
- * doubt, use queue_task(), not queue_task_irq().
+ * - Support for chained bottom half lists. The run_task_queue() function
+ * can be used as a bottom half handler. This is for example useful for
+ * bottom halfs, which want to be delayed until the next clock tick.
* - Bottom halfs are called in the reverse order that they were linked into
* the list.
*/
@@ -75,20 +67,27 @@
* interrupt.
*/

-extern spinlock_t tqueue_lock;
-
/*
- * queue_task
+ * bh_pointer->sync description:
+ * - BIT0 set indicate that a bh function is just in the queue.
+ * - BIT1 set indicate that the bh_pointer->next is not ready to be read.
*/
+
extern __inline__ void queue_task(struct tq_struct *bh_pointer,
task_queue *bh_list)
{
if (!test_and_set_bit(0,&bh_pointer->sync)) {
- unsigned long flags;
- spin_lock_irqsave(&tqueue_lock, flags);
- bh_pointer->next = *bh_list;
- *bh_list = bh_pointer;
- spin_unlock_irqrestore(&tqueue_lock, flags);
+ bh_pointer->sync |= 2;
+ wmb();
+ /*
+ * Disable local interrupt to avoid the CPU to run a nested
+ * run_task_queue() while bit1 is set. -arca
+ */
+ __cli();
+ bh_pointer->next = xchg(bh_list,bh_pointer);
+ clear_bit(1,&bh_pointer->sync); /* see run_task */
+ wmb();
+ __sti();
}
}

@@ -98,14 +97,9 @@
extern __inline__ void run_task_queue(task_queue *list)
{
if (*list) {
- unsigned long flags;
struct tq_struct *p;

- spin_lock_irqsave(&tqueue_lock, flags);
- p = *list;
- *list = NULL;
- spin_unlock_irqrestore(&tqueue_lock, flags);
-
+ p = xchg(list,NULL);
while (p) {
void *arg;
void (*f) (void *);
@@ -113,8 +107,15 @@
arg = p -> data;
f = p -> routine;
save_p = p;
+ /*
+ * queue task may have put p on the list
+ * but not yet glued the rest of the list
+ * back onto p
+ */
+ while(test_bit(1,&save_p->sync));
+ rmb();
p = p -> next;
- mb();
+ wmb();
save_p -> sync = 0;
(*f)(arg);
}
Index: kernel/sched.c
===================================================================
RCS file: /var/cvs/linux/kernel/sched.c,v
retrieving revision 1.1.2.8
diff -u -r1.1.2.8 sched.c
--- sched.c 1999/01/29 15:57:36 1.1.2.8
+++ linux/kernel/sched.c 1999/01/31 02:12:43
@@ -1046,8 +1049,6 @@
sti();
}
}
-
-spinlock_t tqueue_lock;

void tqueue_bh(void)
{
Index: kernel/ksyms.c
===================================================================
RCS file: /var/cvs/linux/kernel/ksyms.c,v
retrieving revision 1.1.2.6
diff -u -r1.1.2.6 ksyms.c
--- ksyms.c 1999/01/26 01:52:38 1.1.2.6
+++ linux/kernel/ksyms.c 1999/01/31 02:12:43
@@ -277,7 +277,6 @@

#ifdef __SMP__
/* Various random spinlocks we want to export */
-EXPORT_SYMBOL(tqueue_lock);
EXPORT_SYMBOL(waitqueue_lock);
#endif

I can't see deadlocks or race in the fixed patch now. But it still need 1
__cli() but it's very better than before, do you agree?

I am running with it applyed starting from now.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/