Re: elevator-starvation-4 (2.2.14 && 2.3.42)

From: Bruce Thompson (bruce@otherother.com)
Date: Fri Feb 11 2000 - 09:26:32 EST

Next message: Butter, Frank: "RE: 3dfx 3500TV"
Previous message: Andre Hedrick: "Re: Mirroring Root Disks"
In reply to: Matthew Kirkwood: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Next in thread: Andrea Arcangeli: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Reply: Andrea Arcangeli: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Reply: Stephen C. Tweedie: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,
I'm not trying to say that it doesn't happen, I'm certain that it
does and that your example is a great one for showing how and when it
can happen. Where I think we disagree is on what the root cause is
and where is the best place to address the issue.

I hate to reiterate, but I will; my assertion is that as long as
the user process is permitted to generate requests faster than the
driver can service them then it is inevitable that some request will
take an unreasonable amount of time to be serviced. The best way to
solve this problem is left as an exercise for the student :-).

It need not be a case of "The driver can handle X MB/sec so give
the other one X/2 MB/sec of requesting ability". One thought that
comes to mind is a simple counter of outstanding requests per
process. Once the counter exceeds some particular highwater mark the
process is blocked from entering more requests until the counter
drops to some lowwater mark. Hmm. ANother thought. Make the counter
device specific. Block anyone attempting to enter a new request on
the device once the request queue has reached a high water mark and
resume accepting requests once a low water mark has been reached.

It isn't terribly surprising to me that the problem seems to
develop more with IDE than SCSI: Time spent generating requests for
an IDE device is time that the driver cannot service those requests
since the CPU must actively work to service the requests. Inverting
this, ensuring that when the CPU has work to do at the device level
that this takes priority over other activities will I think go a long
way to addressing this priority inversion.

Either way, it's an interesting problem, and I truly do believe
that attacking it via mods to the elevator algorithm is merely
putting off the inevitable whereas attacking the root cause will
allow us to degrade much more gracefully in the face of massive I/O
activity.

Cheers,
Bruce.

At 08:40 +0000 02/11/2000, Matthew Kirkwood wrote:
>On Thu, 10 Feb 2000, Bruce Thompson wrote:
>
>> Sorry, but I must respectfully disagree. The time to service the
>> request specifically is _not_ unbounded, it's bounded by the size
>> of the disk.
>
>Bounded by the size of a 100Gb disk array means bounded by several
>hours. I don't want to have to wait several hours for ps to read
>in /etc/passwd so I can kill whatever process went nuts.
>
>> In general I would assert that the probability of the user process
>> swamping the driver in this manner is extremely low, or else the
>> relative priority between the driver and the user process is mixed
>> up.
>
>It happens on reiserfs. Try it.
>
>And if my disk bus/disk/driver can write X Mb per second, and I have a
>single process which wants to write X Mb per second then, filesystem
>permitting, I don't want my task to get stuck writing (X/2) Mb / sec,
>just because some silly heuristic said that it didn't deserve it.
>
>> I would further argue that before sending the block up to the user
>> process you really ought to kick-off servicing block 11 since that's
>> going to take a long time and the user process should be able to
>> proceed in the mean time.
>
>My example was very simplistic. Think how "cp /dev/zero ." works:
>
>Read block from /dev/zero
>Write block to ./zero
>Repeat
>
>Unless you have very slow CPUs, you will fill up RAM with dirty pages/
>buffers eventually. That will cause the write to stall while the kernel
>pushes out dirty pages to disk. Once the first lot of IOs have completed
>and a few others have been scheduled, the cp wakes up, finishes its write
>and gets on with filling your disk.
>
>Now, depending upon your filesystems alloction strategies, it's entirely
>possible completely to starve a process waiting for a teeny-tiny IO at
>the other end of the disk (or at the same end, but you're moving away
>from it).
>
>Really, it happens. You don't see it so mcuh with SCSI, but it happens
>a lot with IDE.
>
>Matthew.

-- -- ------------------------------------------------------------------------ Bruce Thompson | "Pinky! Are you pondering what I'm Palm Computing, Inc. | pondering?" | "Uh, I think so Brain, but where will | we find a duck and a hose at this bruce@otherother.com | hour?" My opinions are strictly my own.

PGP Fingerprint: 8F48 7FEF EE22 14FB 1F2E 3BF2 0D40 9628 53E8 72EB

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Butter, Frank: "RE: 3dfx 3500TV"
Previous message: Andre Hedrick: "Re: Mirroring Root Disks"
In reply to: Matthew Kirkwood: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Next in thread: Andrea Arcangeli: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Reply: Andrea Arcangeli: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Reply: Stephen C. Tweedie: "Re: elevator-starvation-4 (2.2.14 && 2.3.42)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:20 EST