Re: SCHED_RR is broken + patch

From: George Anzinger (george@pioneer.net)
Date: Wed Apr 26 2000 - 07:07:52 EST


Borislav Deianov wrote:
>
> In article <Pine.GSO.3.95.1000425130016.5842C-100000@thales> you wrote:
> > [[ I sent a similar mail to Alan Cox some time ago but I didn't get a
> > response and the fix isn't included in 2.2.16pre1 either. ]]
> >
> > I think I found a Bug in the linux scheduler: A running SCHED_RR
> > process is not preempted when its time slice is exhausted but the process
> > is still running. The following program demonstrates the problem:
> > In theory the child should be preempted when the time slice is exhausted
> > and the parent should be allowed to run. Unfortunately this doesn't work.
>
> Yes, this and other problems with the scheduler have been known for
> quite a while and are still present in 2.3.99:
>
> SCHED_RR threads don't work
> sched_yield doesn't work for RT threads
> sched_yield doesn't always yield for SCHED_OTHER threads
> wrong wake up order for RT threads
> sched_rr_get_interval returns constant bogus value
> counter for SCHED_FIFO threads is never reset
> several wrong and misleading comments in the code
>
> Alan, _please_ add the first item (at least) to the 2.4 jobs
> list. It's a documented feature that just plain doesn't work and I
> think it's a bloody shame, especially since a fix exists (by Artur
> Skawina).
>
> > - if (prev->state == TASK_RUNNING)
> > + if ((prev->state == TASK_RUNNING) && !prev2)
> > goto still_running;
>
> The schedule() code is intentionally spaghettified to allow it to run
> without taking any conditional jumps in the common case. I think the
> above code would break that property.
>
> > - if (can_schedule(p)) {
> > + if (can_schedule(p) || (p == prev2)) {
>
> This adds an extra test for each process in the runqueue. I don't
> think it's possible to fix the SCHED_RR problem without any additional
> cost in the common path but it can at least be constant.
 
Have a look at this patch for the 2.2.14 sched.c. It fixes the first 4
of the above items and also fixes the "misses 'need_resched' if set
after selection of the new task but before the switch". For this latter
I also test the flag just before exit and go round again if set. On my
system this makes a real difference, probably due to missing the
need_resched flag in idle or in the interrupt return path.

This patch reduces the loop for UP systems but adds a test for SMP
systems. Also, the SCHED_YIELD survives a counter reset.

Note prev_goodness() is no longer used, but not patched out.

For the SCHED_FIFO counter not reset, I notice later versions of the
scheduler are not counting down the counter if it is negative to avoid
counting down the idle tasks. Given this, I suggest that
sched_setscheduler just set the counter negative for SCHED_FIFO tasks.

--- linux-2.2.14/kernel/sched.c Tue Apr 11 04:30:52 2000
+++ linux/kernel/sched.c Wed Apr 26 02:32:20 2000
@@ -115,12 +115,12 @@
 #ifdef __SMP__
 
 #define idle_task(cpu) (task[cpu_number_map[(cpu)]])
-#define can_schedule(p) (!(p)->has_cpu)
+#define can_schedule(p) (!(p)->has_cpu || (p) == prev)
 
 #else
 
 #define idle_task(cpu) (&init_task)
-#define can_schedule(p) (p != prev)
+#define can_schedule(p) (1)
 
 #endif
 
@@ -322,7 +322,7 @@
         int this_cpu = smp_processor_id();
         struct task_struct *tsk;
 
- tsk = current;
+ tsk = cpu_curr(this_cpu);
         if (preemption_goodness(tsk, p, this_cpu) > 0)
                 tsk->need_resched = 1;
 #endif
@@ -687,8 +687,9 @@
 {
         struct schedule_data * sched_data;
         struct task_struct *prev, *next, *p;
- int this_cpu, c;
+ int this_cpu, c, oldcounter;
 
+try_try_again:
         if (tq_scheduler)
                 goto handle_tq_scheduler;
 tq_scheduler_back:
@@ -741,6 +742,7 @@
         /* Default process to select.. */
         next = idle_task(this_cpu);
         c = -1000;
+ oldcounter = 0;
         if (prev->state == TASK_RUNNING)
                 goto still_running;
 still_running_back:
@@ -768,6 +770,7 @@
                 }
                 p = p->next_run;
         }
+ prev->counter += oldcounter;
 
         /* Do we need to re-calculate counters? */
         if (!c)
@@ -825,7 +828,10 @@
 
 same_process:
   
+ prev->policy &= ~SCHED_YIELD;
         reacquire_kernel_lock(current);
+ if( current->need_resched)
+ goto try_try_again;
         return;
 
 recalculate:
@@ -841,9 +847,12 @@
         }
 
 still_running:
- c = prev_goodness(prev, prev, this_cpu);
- next = prev;
- goto still_running_back;
+ if (!prev->policy & SCHED_YIELD)
+ goto still_running_back;
+
+ oldcounter = prev->counter;
+ prev->counter=0;
+ goto still_running_back;
 
 handle_bh:
         do_bottom_half();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 30 2000 - 21:00:11 EST