Kernel 2.2.* freezes, temporary fix?

Nils Faerber (Nils.Faerber@unix-ag.org)
Tue, 12 Oct 1999 00:48:45 +0200


Hello all!
This will get a little longer...
Since my new machine freezes at random times with 2.2.* (even with
2.2.13pre16!) I wanted to try debugging it. First I activated serial console
in the hope that I get some debugging output there.
And some time later I really got a freeze with some dumps on the serial
console which told me that the running xmms (mp3 player) process caused some
kernel panic. I know that other people alos reported problems, i.e. freezes,
when using Pentium optimized programs like seti@home client. I then
recompiled xmms without Pentium optimization.
After that I ahd another freeze but this time no dump on either console,
serial or screen. Some time earlier I felt that when the battery, yes I am
talking about a notebook, becomes drained the freezes occur more often. So I
thougt of some APM bug or failure.
In order to find out at which point in the APM code the machine freezes I
added printk()s to every function of apm.c at the beginning and end at
KERNEL_DEBUG level. I now get an output like

Oct 12 00:36:57 toy kernel: apm: E apm_get_power_status()
Oct 12 00:36:57 toy kernel: apm: S check_events()
Oct 12 00:36:57 toy kernel: apm: S get_event()
Oct 12 00:36:57 toy kernel: apm: S apm_get_event()
Oct 12 00:36:57 toy kernel: apm: S apm_bios_call()
Oct 12 00:36:57 toy kernel: apm: E apm_bios_call()
Oct 12 00:36:57 toy kernel: apm: E get_event()
Oct 12 00:36:57 toy kernel: apm: E check_events()
Oct 12 00:36:58 toy kernel: apm: S check_events()
Oct 12 00:36:58 toy kernel: apm: S get_event()
Oct 12 00:36:58 toy kernel: apm: S apm_get_event()
Oct 12 00:36:58 toy kernel: apm: S apm_bios_call()
Oct 12 00:36:58 toy kernel: apm: E apm_bios_call()

about every second. I tried this at first with kernel 2.2.13pre15. After
that the system is running fine all day long!
Today I found kernel 2.2.13pre16 and tested that one. After plain compile
the machine froze again short time after reboot. Next I tried again my
patched apm.c with the above printk()s and it is stable again.

So what can we learn from that?
I do not think that the APM code has a bug but it seems that it can
circumvent the freeze by regularly calling printk().
So I looked into the printk() funktion and saw that printk() uses a lot
spin_lock_*() and spin_unlock_*. Maybe this way a locked down interrupt
gets unlocked?

The next point is that I had problems with my system clock and date. The
clock was too slow and sometimes even seemed to loose a lot more. After
my apm.c additions the clock is running correctly now. Strange isn't it?

Conclusion: I think we are loosing interrupts somewhere! And any interrupt
seems to be affected, even the timer interrupt (the reason why the system
clock differs).

I hope this can help tracing the freeze bug...
Any comments are absolutely welcome!
And if someone has any patch that I could test, as I have a system that
suffers from the freezes, and will do so ASAP!

So far for tonight...
CU
nils

-- 
Nils Faerber (Linux Nils)        eMail: nils@unix-ag.org
Student of computer science      http://www.si.unix-ag.org/~nils/
Unix user group, University of Siegen, Germany

Siegen ... the arctic rain forest!

--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/