* Marton Balint <cus@xxxxxxxxxx> wrote:
On Wed, 19 Aug 2009, Peter Zijlstra wrote:
On Wed, 2009-08-19 at 14:34 +0200, Marton Balint wrote:
On Wed, 19 Aug 2009, Peter Zijlstra wrote:
On Wed, 2009-08-19 at 14:01 +0200, Marton Balint wrote:On Wed, 19 Aug 2009, Peter Zijlstra wrote:On Tue, 2009-08-18 at 21:49 +0200, Marton Balint wrote:
In the meantime, I was able to create a tiny C program which always
succesfully reproduces the bug. It's basically an endless loop which does
not stop while the process is running on the last CPU core. The program
creates multiple instances of itself, to be able to keep all of the CPU
cores busy. After 1 second, the processes running on other than the last
CPU core die, the processes running on the last CPU core remain stuck
there...
I tested it on my dual core system, if someone could test it on a quad
core and report back that would probably be useful.
Usage: ./schedtest <number of CPU cores>
And don't forget to kill the stuck processes after using the program! :)
So what's the bug? Sure one task will stay on the cpu, and because there
is no contention it doesn't get migrated, and therefore won't quit,
how's that a problem?
Problem is that more than one processes remain on that CPU core, and none
of them get migrated to other (idle) cores. I tested it with my E8400
processor and 2.6.31-rc5-git3 kernel.
Only one remains here.. on a c2q running 2.6.31-rc6-tip
Do you have a .config handy?
Yes it's in my original post:
http://marc.info/?l=linux-kernel&m=125012584709800&w=2
Right you are,.. so I build a kernel with the cgroup scheduler in and
tested it on a dual-core opteron machine, but I can't seem to reproduce
this.
Are you using cgroups in any way, or do you simply have it enabled in
your config?
No, it's just enabled. Actually the kernel is from the
openSUSE build service:
http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.1/x86_64/
But the problem is present for both the kernel-default
kernel and the kernel-vanilla kernel which does not
contain any suse-specific patches.
This evening I had a bit more time to test, and I've
made a surprising discovery: I can only reproduce the
bug if the kernel module of my TV tuner card is loaded.
I have a Leadtek Winfast 2000 XP Expert TV card, it
uses the cx8800 kernel module. It seems that the
problem is somehow related to the infrared sensor of
the TV card, because I recompiled the module with the
'case CX88_BOARD_WINFAST2000XP_EXPERT:' line removed
from cx88-input.c and I couldn't reproduce the bug with
the new kernel module.
Extremely weird. Are timers somehow busted?