Re: [patch] lowlatency patch for 2.4, lowlatency-2.4.0-test6-B5

From: Andrew Morton (
Date: Fri Aug 04 2000 - 07:52:35 EST

Ingo Molnar wrote:
> i've ported my 2.2 lowlatency patch to 2.4.0-test6-pre1.

The latency numbers are low. It works well.

I'm testing lowlatency-2.4.0-test6-C4 on 2.4.0-test5. 500MHz uniprocessor.

We're getting lockups in sync_buffers(). It happens on the close of a block
device after there has been a lot of buffer dirtying and the `amlat' tool is
running to apply scheduling pressure.

`amlat' runs SCHED_FIFO and handles a 256 Hz stream of SIGIO's from /dev/rtc.

When `amlat' is killed, sync_buffers terminates and everything is OK.

To reproduce:

# amlat &
# dd if=/dev/zero of=x bs=1k count=100000
# lilo # hangs.

open("/dev/hde1", 0x4) = 6
stat("/dev/hda", {st_mode=S_IFBLK|0600, st_rdev=makedev(3, 0), ...}) = 0
open("/dev/hda", O_RDWR) = -1 EROFS (Read-only file system)
ioctl(6, HDIO_GETGEO, 0xbffff0dc) = 0
close(6 <<< Hangs

The program counter is pointing at:

            if (!bh)
                   goto repeat2;

            for (i = nr_buffers_type[BUF_DIRTY]*2 ; i-- > 0 ; bh = next) {
here >>> next = bh->b_next_free;

                    if (!lru_list[BUF_DIRTY])

The caller is the first call to sync_buffers() from fsync_dev().

It looks like the `goto repeat' in sync_buffers is the culprit - it keeps
setting `i' back to zero and never terminates. This is probably too
elaborate for you to bother reproducing - I'll wait for the next version, let
you know.

All the usual "don't do that's" are still there:

- 4-5 millisec scheduling bumps caused by closing raw block devices. This
may be related to sync_buffers(), but otherwise I suggest you not worry about
this case: it's only lilo and hdparm, etc.

- 4-6 millisecs during X server startup. Again, not worth fixing IMO.

- The infamous psaux thing - not worth fixing (Benno says 20 millisecs)

- There's some device driver module which takes ~100 millisecs when it's
loaded (Benno has details). Leave it alone - we can't and shouldn't audit all
the device driver initialisation code.

- I haven't tested fbdev - Pavel says it's bad.

There were also a couple of 4-5 millisec bumps which I cannot explain. I
will instrument the kernel and have a further poke around, but frankly I
don't want to spend too much time with the "take no prisoners" patch - I'd
rather work on the mainstream candidate.

> ...
> I strongly disagree with the "it's ok in 99.9% of the cases" approach,
> because in fact it's very easy to trigger bad latencies under various
> (common) workloads. And i just do not want Linux to become another
> Windows: "well you can play music just fine, as long as you dont do this
> and dont do that, and for God's sake, put enough RAM into your system.".

This is a significant misinterpretation of my position, but let's move on...

> reports, comments, suggestions welcome!

You said: "i'll split the patch up into an 'uncontroversial' and
'controversial' part"

Unfortunately it's all controversial. If you take this approach you'll end
up with an empty patch file :-)

I read Linus' June comments very carefully. I think what he's saying is:

- conditional_reschedule() is a kludge
- some kludges are acceptable
- keep the kludge count to the minimum
- only put rescheds in the places where we can clearly
  identify the reason - not just "it worked so I did it".
- he'd accept a handful of rescheds (no hard number).

And I'd add:

- Keep it clean: rescheds at the high levels. No dropping locks, obfuscating
other people's code, etc.

So given these criteria I suggest the approach is to only poke holes in the
places which demonstrably and explicably need it. As you know, I've
identified nine places, and these work well, except for the VM problem.

I don't think you should put _any_ rescheds in the VM! That's just covering
up a problem. The VM should _not_ be spending 60 millisecs (you saw 200)
just crunching on stuff. The approach should be to wait until the VM has
been fixed and to then reevaluate the need for rescheds in there.

If the VM guys will explain why 60-200 msec thinking time is correct and
appropriate behaviour then I'm dead wrong and the rescheds are needed. But I
don't think so...

Also, your assembler and .text.condsched changes are appropriate to the "no
prisoners" patch, but they should not be used in the "uncontroversial"
patch. Keep it simple and implement conditional_reschedule() in C. This is

1: If the code size is significant, the patch is wrong: it has too many

2: If the performance diff is significant, the patch is wrong:
conditional_reschedule() is being called too often.

Consider this: with my minimal patch, with the workload being building a
kernel whilst running `amlat' on UP, conditional_resched() is called 690
times per second and it actually does a schedule() twice a second. This is
very good. The max latency was 800 usecs.

The cost of executing

        if (current->need_resched)

every 1.3 milliseconds is vanishingly small and it does not merit an assembly

What latency target are you shooting for? My gut feel is that a worse-case
latency of two millisecs will require no more than 15 conditional_rescheds.
Pushing this down to one millisecond will require many more and is probably
not justified.

Also, what are the conditional_rescheds in copy_*_user for? I haven't
observed them doing anything very useful, maybe you have better/different
test cases. I took them out of your patch and didn't observe any

Finally, what workload are you using to test this patch?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Mon Aug 07 2000 - 21:00:12 EST