[patch] timers again

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat Jun 17 2000 - 08:23:36 EST


This one fixes up the del_timer_sync deadlock detector a bit.

- If a deadlock is detected, print the message and bust out, so the
  machine keeps running (in this case, del_timer_sync will end up
  acting like a really, really slow del_timer_async).

- Limit the number of messages to 10.

- Print some extra info (the timer handler address, a stack dump for
  x86).

- I'm offended to discover that __builtin_return_address(0) returns
  the wrong address under gcc 2.7.2.3 with -fomit-frame-pointer so I
  open-coded the stack peek and made it x86-specific.

- Moved the what-to-do instructions off of some random guy's web page
  and into Documentation/kernel-timers.txt

So, where do we now stand with timer deletion races?

- Various fixlets for net/ipv4, net/core and net/ipv6 are in Alexey's
  hands.

- With my earlier patch and the stuff which will trickle in from
  maintainers, the important net drivers are done.

- IDE is safe.

- drivers/char was mostly fixed in my timer_struct killer.

OTOH:

net/sunrpc: looks wrong - tasks can be woken after they've
            been removed from the wait queue.

drivers/video: some races in the cursor flashing code.
               Petr has a patch...

drivers/scsi: Dunno yet. SCSI makes my brain hurt. It
              could be that SCSI timers only ever expire under
              catastrophic conditions, so we may not have to worry.

drivers/net/slip.c: quagmire

drivers/net/wan/*: hard to fix

net/appletalk, net/decnet, net/ax25, net/ipx: look wrong, need work

net/irda: looks wrong. Very hard to fix.

drivers/net/pcmcia/*: probably not very important
                      for SMP. dhinds will be taking a look.

Once SCSI is sorted we're OK for a classical SMP "server". But not for
an SMP "desktop". It would be rather nice if more than one person was
looking into this...

--- linux-2.4.0-test1-ac19/kernel/timer.c Fri Jun 16 00:43:30 2000
+++ linux-akpm/kernel/timer.c Sat Jun 17 23:21:21 2000
@@ -244,9 +244,18 @@
                         while (timer_is_running(timer) && --count)
                                 ;
                         if (count == 0) {
- printk( "del_timer_sync(%p): deadlock! Called from %p\n",
- timer, __builtin_return_address(0));
- printk("See http://www.uow.edu.au/~andrewm/linux/deadlock.html\n");
+ static int ntimes = 10;
+ if (ntimes) {
+ --ntimes;
+ printk( "del_timer_sync(%p): deadlock!\n", timer);
+ printk("handler=%p\n", timer->function);
+ printk("See Documentation/kernel-timers.txt\n");
+#ifdef CONFIG_X86
+ printk("Called from %p\n", (&timer) - 1);
+ show_stack(0);
+#endif
+ }
+ return ret;
                         }
                 }
         }
--- linux-2.4.0-test1-ac19/Documentation/kernel-timers.txt Sat Jun 17 23:21:34 2000
+++ linux-akpm/Documentation/kernel-timers.txt Sat Jun 17 16:54:34 2000
@@ -0,0 +1,42 @@
+Kernel timer deadlock diagnostics
+Andrew Morton <andrewm@uow.edu.au>
+17 June 2000
+
+Kernel 2.4.0-test introduced a mechanism to detect and break out of
+deadlocks which may occur in a call to the del_timer_sync() function.
+
+If you're reading this file, then you have probably just seen a message like this:
+
+ del_timer_sync(c0137680): deadlock!
+ handler=c0123340
+ See Documentation/kernel-timers.txt
+ Called from c0108890
+ cec7ff4c c0220803 00000000 d082913f 00000246 00000000 00000001 00000000
+ 00000001 00000000 d0828155 d08292a4 d0828a95 00000005 d0828a77 d0828445
+ 00000000 00000000 d0828795 00000000 00000100 cecc3f28 d0828052 ffffffea
+Call Trace: [<c0227529> ... ]
+
+come out of your 2.4.x kernel.
+
+This means that the timer synchronisation code has detected a deadlock
+condition which will have to be fixed.
+
+Please take the following steps to generate a call backtrace:
+
+ 1. cd /usr/src/linux
+ 2. gdb vmlinux
+ 3. x/10i 0xc0108890 (The 'Called from' number above, with a '0x' in front)
+ 4. x/10i 0xc0123340 (The 'handler' address from above)
+ 5. x/10i 0xc0227529 (The 'Call Trace' number form above)
+ 6. x/10i .......... (Some more number form the 'Call Trace' list)
+
+Omit any address which do not start with "c0" (This is x86 specific.
+If you're running another platform, you probably know what to do).
+
+Please send an email to linux-kernel@vger.rutgers.edu describing
+
+- what has happened
+- how frequently you are seeing it
+- the output from the gdb command above
+- system description as per the REPORTING-BUGS file
+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:14 EST