Problems diagnosing hard lockups in linux kernel

From: James Finnie (jf1@IMERGE.co.uk)
Date: Mon Oct 14 2002 - 06:31:31 EST


Hi guys,

We are having real trouble diagnosing a hard lockup that we are seeing on
our platform, which is based around the Natsemi Geode GX1 plus 5530A
companion chip. Also present on the board is a MacPhyter DP83815 rev C
ethernet controller, and an IC Ensemble Envy 24 PCI-i2s controller.

Our previous distro of software for the platform was based around RedHat 6.1
+ newer GLIBC(RH71) (to fix PTHREADs buggyness). The kernel in use was
Linux 2.4.17 + a couple of very trivial cosmetic patches + the NatSemi
framebuffer driver. This combination works great.

We wanted to switch to a RedHat 7.3 based system as we were slightly
uncomfortable about running with everything compiled against a different
GLIBC to that installed. We kept the kernel and all drivers exactly the
same, and changed just the RedHat RPMS. Now we are seeing hard lockups on
our boxen after around 12-20 hours! The lockup is severe; caps lock is
dead, as is ping. There is no OOPS.

In our attempts to debug this (now ongoing for 3 weeks) we've tried the
following:

* Upver kernel to latest 2.4.19 + 2.4.20preX patches incase we were seeing a
known bug
* Compile kernel with kgdb and MagicSysRq - neither of which we can actually
get into as the lock is so severe
* Run without the National framebuffer - similar frequency of locks is seen.
Also see the problem when using VESA fb.
* change kernel compiler

We don't see this failure on other hardware - or at least not at anywhere
near this level of incidence - hence we are suspecting that this is a kernel
level lock, which is only now being exposed by the different utilization
patterns created by the new userland stuff.

The only thing that seems to fix things at the moment is the complete
removal of all the userland stuff... less than ideal for our application :-)

Does anyone have any clue as to other ways we can investigate the lock when
it occurs? Are there any "hired guns" out there who would would be
available to help us out with this?

Thnaks in advance for any suggestions;

James Finnie

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Imerge Limited Tel :- +44 (0)1954 783600
Unit 6 Bar Hill Business Park Fax :- +44 (0)1954 783601
Saxon Way Web :- http://www.imerge.co.uk
Bar Hill
Cambridge
CB3 8SL
United Kingdom
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:49 EST