Re: scheduler ignores need_resched flag in 2.2.x and 2.3.x?

From: yodaiken@fsmlabs.com
Date: Tue Mar 21 2000 - 00:32:59 EST


As far as I can tell,
only place need_resched is set is in the scheduler and the sched idle loop.
This is what makes Ingo's latency patch less dangerous: need_resched is hard
to set. Your scenario requires that an interrupt routine, interrupting
kernel mode, set needs_resched. Where does this happen?

On Mon, Mar 20, 2000 at 02:59:28PM -0800, Jun Sun wrote:
>
> It appears need_resched flag is changed from a global variable to
> an instance variable inside task_struct since v2.2.x. I found out
> that the flag could be ignored by the scheduler. This potentially
> causes up to 200 ms delay in dispatching the highest priority process.
>
> THE SCENARIO
>
> 1. Among other things, the schedule() function picks the process with
> highest value of goodness, and switches the context from the current
> process to the new one.
>
> 2. *AFTER* the scheduler picks the process with the highest goodness
> value
> but *BEFORE* the context is switched, the interrupts are open.
>
> 3. If an interrupt happens during that time window, and if the interrupt
> handler sets the need_resched flag, the need_resched flag will be set
> inside the CURRENT process'es task_struct.
>
> 4. Once the interrupt handling is done, the scheduler will continue to
> do the context switching. The process with the highest goodness
> value
> will become the CURRENT process, and, presumably, its need_resched
> flag
> is clear.
>
> 5. Later on the kernel only checks the need_resched flag in the CURRENT
> process to decide whether it needs to do any rescheduling. Since the
> need_resched flag in the new CURRENT process task_struct is clear,
> the kernel will not do any rescheduling, unless due to other reaons
> that causes the need_resched in the new CURRENT process to be set.
>
> If the highest priority process is made availabe in step 3, it could be
> delayed from execution for up to 200ms, the maximum time slice a process
> can run without a forced rescheduling.
>
>
> THE TEST
>
> I did a test to confirm the above scenario. It involves the parallel
> port (to generate interrupts), a driver module, a user test program,
> and a small patch of kernel.
>
> This test demonstrates that a highest priority process can be
> delayed from execution for up to 200 ms due to the above scenario.
>
> If there are enough interestes, I will post the test itself
> and the test results.
>
> The test is done on v2.2.12. But the code is still the same in
> v2.3.x. I assume the same bug exists in v2.3.x.
>
>
> THE FIX
>
> First of all, I think this is must-fix bug. (Any arguments here?)
>
> There are a couple of ways to fix it. I personally favor the following
> :
>
> Inside the schedule() function, after the switch_to() and
> __schedule_tail()
> calls, add the following code :
>
> if (prev->need_resched) {
> current->need_resched = 1;
> prev->need_resched = 0;
> }
>
> This piece of code in a sense lets the new CURRENT process "inherits"
> the value of need_resched from the previous CURRENT process.
>
>
> DO I MISS ANYTHING HERE?
>
> <to be filled by you ...:-0 >
>
> Jun
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-- 
---------------------------------------------------------
Victor Yodaiken 
FSMLabs:  www.fsmlabs.com  www.rtlinux.com
FSMLabs is a servicemark and a service of 
VJY Associates L.L.C, New Mexico.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:31 EST