Re: [patch] entry.S fix. [was: Re: scheduling problem?]

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Sat, 18 Dec 1999 01:41:58 +0100


I could be wrong as I'm half asleep but I think there is still a bug
with Ingo's do_signal fix :-)

> William Montgomery wrote:
> > [...] however, there is a related
> > problem which can occur. It is possible for a SCHED_OTHER task
> > to be running in user mode when a rtc_interrupt occurs which
> > sends a SIGIO to a SCHED_FIFO task, the SCHED_FIFO task is put
> > on the run queue and the need_resched flag of the SCHED_OTHER task
> > is set but the ret_with_resched is not run and the SCHED_OTHER task
> > can continue for several milliseconds before schedule is run.

> > The above scenario can also occur when setitimer is used as a
> > timing source and a SIGALRM is sent.

Those seems plausible. There is a similar race to the cpu_idle one in
entry.S which would cause both those problems.

> > Maybe we should *always*
> > run ret_with_resched upon ret_from_intr and not only when in
> > supervisor mode? Any reasons this would cause problems?

You're right in the diagnosis, but that fix could overflow the kernel
stack due to unbounded recusion.

In ret_with_reschedule, an interrupt can occur after checking
need_resched(%ebx) but before the iret in RESTORE_ALL.g The interrupt
sets need_resched, but cannot reschedule, and the interrupted task is
already in RESTORE_ALL so it doesn't check need_resched.

I can think of two solutions:

- cli before checking need_resched (or before checking it a second
time), and let the iret restore the interrupt flag. Simple but
not the fastest.

- have the ret_from_intr path check if it is returning to an
interrupted return path, and if so fixup up the return address to
do the same thing but reschedule instead of return to user mode.
This has no overhead in most cases. The standard return path is
not modified.

Ingo's do_signal fix is still needed in the midst of this.

I'm willing to code either but I'd like an opinion first: is adding a
cli before the need_resched(%ebx) check worth avoiding?

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/