Re: Badblocks and no free pages...

Doug Ledford (dledford@dialnet.net)
Mon, 05 May 1997 04:37:07 -0500


--------
> Doug Ledford wrote:
> > It's all because we are filling up all available RAM with write behind
> > buffers. Whenever it can't get a free page, it simply waits for some to
> > become available. Not a bug really, just shows us that the program is
> > writing as fast as it can.
>
> Ho! stop. The "couldn't get a free page" is when kmalloc gets a request
> that it has to pass on, and that this request couldn't get honoured. It
> was requested at a level that didn't allow sleeping or even the sleeping
> failed.....

It is also harmless in this case. Most likely (and I haven't looked into it
deeply only because I knew it was harmless) we are running into a kmalloc for
a DMA buffer for a write where there wasn't any RAM to use until some other
writes completed. In this case, it's up to the block driver or the hardware
driver (whichever one had the failed request) to either wait until some RAM is
free (schedule() waiting for the RAM) or to pass the command up as incomplete,
at which point it gets re-tried.

>
> > > - The unusable sluggishness of the machine is a bug.
> >
> > It was never intended to be something that you would run during normal
> > usage, it's a shake down, tear the drives and controllers apart type test
> > that should be run when you are aware of what these types of tests do to
> > machine performance and are prepared to wait for it to finish before
> > actually trying to do anything :)
>
> Actually having several programs write like hell filling all available
> memory with write-behind buffers is not that unlikely. The kernel
> should somehow try to make this situation feel less sluggish.

I'm not sure there is much a person could do short of throttling those
applications. Then you would need to detect this behaviour and take the
needed steps, which means some sort of heuristic, which means it is prone to
being fallible by certain circumstances. I'm not so sure that four programs
attempting to write in excess of 1GB of information all total simultaneously
and all using non-blocking I/O is really all that common. The second problem
with trying to make the system responsive in this case is disk access. The
real sluggishness is not caused by the lack of RAM, but more by the lack of
available drive bandwidth, controller bandwidth, and hence, swap bandwidth.
If you don't have any free RAM the system will still perform reasonable
assuming you don't have 32 or 64MB worth of writes queued to the drives in
front of you.
A fix in this case would require some sort of prioritizing on disk access.
Even then you can still have problems since you may not be able to jump
requests too far ahead in the queue (take for instance a SCSI controller with
28 commands per lun and tagged queueing, no matter how high you prioritize a
command, if you already have 28 sent to the drive, then it's up to the drive
when this one gets processed).

-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford@dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************