Re: CPU scheduler weirdness?

From: Ingo Molnar
Date: Thu Aug 20 2009 - 06:56:58 EST



* Marton Balint <cus@xxxxxxxxxx> wrote:

>
> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>
>> On Wed, 2009-08-19 at 14:34 +0200, Marton Balint wrote:
>>>
>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>
>>>> On Wed, 2009-08-19 at 14:01 +0200, Marton Balint wrote:
>>>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>>>> On Tue, 2009-08-18 at 21:49 +0200, Marton Balint wrote:
>>>>>>
>>>>>>> In the meantime, I was able to create a tiny C program which always
>>>>>>> succesfully reproduces the bug. It's basically an endless loop which does
>>>>>>> not stop while the process is running on the last CPU core. The program
>>>>>>> creates multiple instances of itself, to be able to keep all of the CPU
>>>>>>> cores busy. After 1 second, the processes running on other than the last
>>>>>>> CPU core die, the processes running on the last CPU core remain stuck
>>>>>>> there...
>>>>>>>
>>>>>>> I tested it on my dual core system, if someone could test it on a quad
>>>>>>> core and report back that would probably be useful.
>>>>>>>
>>>>>>> Usage: ./schedtest <number of CPU cores>
>>>>>>>
>>>>>>> And don't forget to kill the stuck processes after using the program! :)
>>>>>>
>>>>>> So what's the bug? Sure one task will stay on the cpu, and because there
>>>>>> is no contention it doesn't get migrated, and therefore won't quit,
>>>>>> how's that a problem?
>>>>>
>>>>> Problem is that more than one processes remain on that CPU core, and none
>>>>> of them get migrated to other (idle) cores. I tested it with my E8400
>>>>> processor and 2.6.31-rc5-git3 kernel.
>>>>
>>>> Only one remains here.. on a c2q running 2.6.31-rc6-tip
>>>>
>>>> Do you have a .config handy?
>>>>
>>>
>>> Yes it's in my original post:
>>>
>>> http://marc.info/?l=linux-kernel&m=125012584709800&w=2
>>
>> Right you are,.. so I build a kernel with the cgroup scheduler in and
>> tested it on a dual-core opteron machine, but I can't seem to reproduce
>> this.
>>
>> Are you using cgroups in any way, or do you simply have it enabled in
>> your config?
>
> No, it's just enabled. Actually the kernel is from the
> openSUSE build service:
>
> http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.1/x86_64/
>
> But the problem is present for both the kernel-default
> kernel and the kernel-vanilla kernel which does not
> contain any suse-specific patches.
>
> This evening I had a bit more time to test, and I've
> made a surprising discovery: I can only reproduce the
> bug if the kernel module of my TV tuner card is loaded.
> I have a Leadtek Winfast 2000 XP Expert TV card, it
> uses the cx8800 kernel module. It seems that the
> problem is somehow related to the infrared sensor of
> the TV card, because I recompiled the module with the
> 'case CX88_BOARD_WINFAST2000XP_EXPERT:' line removed
> from cx88-input.c and I couldn't reproduce the bug with
> the new kernel module.

Extremely weird. Are timers somehow busted?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/