Re: [patch] disable_bh/enable_bh race fix [Re: Program to freeze keyboard in 2.1.131]

Andrea Arcangeli (andrea@e-mind.com)
Sun, 27 Dec 1998 17:58:55 +0100 (CET)


On Tue, 22 Dec 1998, Andrea Arcangeli wrote:

> This my patch I developed in the last hours (ugh even if it' s very late
> seems to work, strange ;)) fixes perfectly the console race here.

It still works _fine_ in practice but now I can see some other problem
that make me thought that could be not the right fix... The last night I
found _the_ cause of the race I was reproducing and I fixed _only_ it,
tried and worked, but now I thought a bit more about it...

> @@ -97,13 +97,13 @@
> extern inline void disable_bh(int nr)
> {
> bh_mask &= ~(1 << nr);
> - bh_mask_count[nr]++;
> + atomic_inc(&bh_mask_count[nr]);
> synchronize_bh();
> }
>
> extern inline void enable_bh(int nr)
> {
> - if (!--bh_mask_count[nr])
> + if (atomic_dec_and_test(&bh_mask_count[nr]))
> bh_mask |= 1 << nr;
> }
>

Note my first patch is doing nothing of wrong but I can still see a
theorical small window for a (less harmful in the console case) race in
the disable/enable_bh...

The bh_mask variable is not atomically modified with the bh_mask_count. So
theorically it could happens that if disable_bh is run a bit after
enable_bh, the bh handler get not really disabled because bh_mask |= 1 <<
nr could run a bit after than bh_mask &= ~(1 << nr)...

I would like to fix the race reverting completly my last atomic_t patch
and then adding a start_bh_atomic()/end_bh_atomic() around:

> - if (!--bh_mask_count[nr])
> bh_mask |= 1 << nr;

and around:

> bh_mask &= ~(1 << nr);
> - bh_mask_count[nr]++;

but start_bh_atomic() is not enough :( since the synchronize_bh() at the
end of start_bh_atomic() will do nothing if we are running in a irq
handler (due in_interrupt() implementation).

Since I think we should use in_interrupt() only to know if we can sleep or
not, and not to know if we can synchronize_bh or not, I changed/fixed that
bit in irq.c too and I fixed the disable/enable_bh race in my new
start/end_bh_atomic() way.

I think it's the right way also because the bh_mask will be read carefully
only from an atomic_bh context.

Here a new patch that fix the bug fine (as the first fix) against 2.1.132:

Index: arch/i386/kernel/irq.c
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1.2.10
diff -u -r1.1.1.1.2.10 irq.c
--- irq.c 1998/12/23 00:29:43 1.1.1.1.2.10
+++ linux/arch/i386/kernel/irq.c 1998/12/27 16:30:20
@@ -21,6 +21,11 @@
* Copyright (C) 1998 Andrea Arcangeli
*/

+/*
+ * synchronize_bh can't synchronize _only_ if we are in a bh handler.
+ * Copyright (C) 1998 Andrea Arcangeli
+ */
+
#include <linux/ptrace.h>
#include <linux/errno.h>
#include <linux/kernel_stat.h>
@@ -449,7 +454,8 @@
*/
void synchronize_bh(void)
{
- if (atomic_read(&global_bh_count) && !in_interrupt())
+ if (atomic_read(&global_bh_count) &&
+ !local_bh_count[smp_processor_id()])
wait_on_bh();
}

Index: include/asm-i386/softirq.h
===================================================================
RCS file: /var/cvs/linux/include/asm-i386/softirq.h,v
retrieving revision 1.1.1.1.2.3
diff -u -r1.1.1.1.2.3 softirq.h
--- softirq.h 1998/12/22 00:34:47 1.1.1.1.2.3
+++ linux/include/asm-i386/softirq.h 1998/12/27 16:21:31
@@ -12,7 +12,7 @@
extern inline void init_bh(int nr, void (*routine)(void))
{
bh_base[nr] = routine;
- atomic_set(&bh_mask_count[nr], 0);
+ bh_mask_count[nr] = 0;
bh_mask |= 1 << nr;
}

@@ -96,15 +96,18 @@
*/
extern inline void disable_bh(int nr)
{
+ start_bh_atomic();
bh_mask &= ~(1 << nr);
- atomic_inc(&bh_mask_count[nr]);
- synchronize_bh();
+ bh_mask_count[nr]++;
+ end_bh_atomic();
}

extern inline void enable_bh(int nr)
{
- if (atomic_dec_and_test(&bh_mask_count[nr]))
+ start_bh_atomic();
+ if (!--bh_mask_count[nr])
bh_mask |= 1 << nr;
+ end_bh_atomic();
}

#endif /* __ASM_SOFTIRQ_H */
Index: include/linux/interrupt.h
===================================================================
RCS file: /var/cvs/linux/include/linux/interrupt.h,v
retrieving revision 1.1.1.1.2.4
diff -u -r1.1.1.1.2.4 interrupt.h
--- interrupt.h 1998/12/22 00:34:48 1.1.1.1.2.4
+++ linux/include/linux/interrupt.h 1998/12/27 16:27:19
@@ -17,7 +17,7 @@

extern volatile unsigned char bh_running;

-extern atomic_t bh_mask_count[32];
+extern int bh_mask_count[32];
extern unsigned long bh_active;
extern unsigned long bh_mask;
extern void (*bh_base[32])(void);
Index: kernel/softirq.c
===================================================================
RCS file: /var/cvs/linux/kernel/softirq.c,v
retrieving revision 1.1.1.1.2.3
diff -u -r1.1.1.1.2.3 softirq.c
--- softirq.c 1998/12/22 00:53:26 1.1.1.1.2.3
+++ linux/kernel/softirq.c 1998/12/27 16:27:14
@@ -7,8 +7,9 @@
* enabled. do_bottom_half() is atomic with respect to itself: a
* bottom_half handler need not be re-entrant.
*
- * Fixed a disable_bh()/enable_bh() race (was causing a console lockup)
- * due bh_mask_count not atomic handling. Copyright (C) 1998 Andrea Arcangeli
+ * Fixed a disable_bh()/enable_bh() race (was causing a console lockup),
+ * both the two functions must run in a bh_atomic context.
+ * Copyright (C) 1998 Andrea Arcangeli
*/

#include <linux/mm.h>
@@ -20,7 +21,7 @@

/* intr_count died a painless death... -DaveM */

-atomic_t bh_mask_count[32];
+int bh_mask_count[32];
unsigned long bh_active = 0;
unsigned long bh_mask = 0;
void (*bh_base[32])(void);

Comments?

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/