I used a tiny program that simply writes to a file with a given request
size, to test fs sequential write performance (like IOZONE) on Linux.
I did several tests that wrote to 1 GB file using 1MB request size. The
application itself allocates 1 MB buffer and reuses it.
What I had found out is that when I ran multiple instances (say 3) of this
program against 2.4.0-test5, the kernel would eat up all the memory it has.
After I started the 4th instance of the program, the kernel simply hung on
me. I got messages from my ethernet card, saying that the card has no
resources at the time when the system hung also. I tried this with stock
kernel running against a 4-way Dell 6300 Xeon. I had 6 36 GB drives, and I
created 20 GB EXT2 file system on each of the drive. Each instance of the
program ran against one of the 6 file systems.
I suspected that this is a problem that has something to do with the fact
that the fs buffer cache is consumed much faster than can be
kflushd/kupdate'd. Eventually, the entire system memory is used by the fs
buffer cache, nothing else can grab any memory when they need it, which
caused things to fail, and eventually, the kernel hung. Based on such
speculation, I went back and changed the bdflush parameter values to use
much shorter time intervals between flushes and kupdate, etc. After
changing the values to ~1/3 of the originals, I was able to start 6
instances of the FS program rather than 4 before the kernel hung again. I
finally changed the intervals and some of the thresholds to really small
ones to really force data out frequently. After doing that, I managed to
run 9 instances of the application without hanging the system, but I think
this is really not what I wanted to do, and it may well hang again if I
keep on starting the runs.
I had also run SPEC SFS against the stock 2.4-test5 kernel running on the
same machine, where I ran into the same memory problem -- the kernel hung
due to the shortage of memory for file system buffer cache, I think. When
it hung, it spit out messages like "IP: queue_glue: no memory for gluing
queue d13604e0". Obviously, I simply cannot get SPEC SFS to run larger
configurations than 2500 IOPS on a 4-way Dell server class machine.
I guess my point is that there seems to be a problem in the way that the
Linux file system buffer cache uses the memory. Right now, it appears to me
that there is simply no control on the file system buffer cache usage, to a
point that even a simple application like IOZONE can kill the whole system.
Of course, this is only my guess, there may be other things that I don't
see. It also seems to be obvious that the file system should not let async
writes to happen if too much buffer space is used, i.e., the backlog is
huge. So, the normal writes should not return until the file system flushes
some amount of buffer space to disk, making room for the subsequent writes.
Using kflushd is not the same as forcing synchronous writes, since kflushd
is only a daemon after all. As long as the rate of buffer consumption is
higher than the buffer flushing, the current buffer cache/memory management
has a potential problem.
I'm wondering if anyone else had seen similar problems. Or maybe someone
already has a solution, it's just that I have not seen it.
Any suggestions/comments/solutions? Thanks a lot!
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Mon Aug 07 2000 - 21:00:07 EST