Re: /tmp in swap space

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Wed, 27 May 1998 15:02:14 +0100


On Mon, May 25, 1998 at 11:50:23PM -0600, Larry McVoy wrote:
> Wait a second. I misunderstood you. I thought you were asking about
> metadata. If you are worried about the actual data, of course it doesn't
> write them, they are no longer associated with a file. Writing them
> would be pointless. I haven't looked recently, but here's what you do:
> go look at what happens when you call ftruncate(). That code path should
> go find all data blocks associated with the inode and invalidate them.
> The invalidation should clear the dirty bit. Not doing so would lead to
> chaos.

Ahhhhhhhhh....
Ahaha....

I was just going to write that I didn't see any such thing in that code
path, then I noticed the `bforget' and was enlightened. Ok, you're
right, I have no further questions on this.

One incy-wincy little thinglet though. `bforget' marks the buffer
clean, decrements the user count, then unhashes it. Does that mean
there may be other users using the buffer, who may not expect this to
happen? I.e., race condition. I'm thinking that marking a buffer dirty
is always safe anyway, but maybe not forgetting one?

This is more apparent in the ext2 `trunc_direct' function. It checks
that some things haven't changed under ext2's feet each time round the
loop, then proceeds to change stuff itself. Or is this only because
`ext2_free_blocks' can block?

> The memory gets freed immediately. You can try this: write a tiny program
> which writes a file about 80% the size of memory, deletes it, and then
> time the rate it which it can do it the second time. It should go at
> bcopy speeds.

I just tried that using:

time dd if=/dev/zero of=tmpfile bs=27000k count=1; time rm tmpfile

and couldn't confirm the result.

My RAM size is 64MB. The above takes about 0.7 seconds to do the `dd',
then about 4 seconds for the `rm'.

If I try using `bs=55000k' in the above, it takes several seconds just
to write the data using `dd'. `rm' takes several more seconds to delete
the file. These few seconds may well be long enough to write all, or
half of, the data to disk. (Disk read time is about 9MB/sec with DMA,
4.5MB/sec without; haven't ever timed write time).

If I try `bs=35000k', when free reports "buffers" as 28000k, `dd' takes
about 4 seconds and `rm' about 0.7 seconds.

These timings seem to depend on the relative size of `bs=???' and
"buffers" as reported by `free'. "buffers" does not change much after
each test, but can certainly have different values according to prior
system activity. So for the middle values (35000k), it is a matter of
luck whether it's going to take a while in `dd' or in `rm'. The overall
time seems about the same though.

The interesting things are:

- The memory allocated for "buffers" stays about the same after
successive invocations of the above. If I use a size for `dd' that
is larger than the buffers size, the `dd' takes several seconds and
the `rm' takes less than a second, and the buffers size increases
by some tiny amount.

- After the `dd', the system is quite unresponsive and there is some
(but not a lot) of disk activity for a few seconds. I expect this
is `kswapd' being busy, but I can't tell. It may be IDE I/O
(I have DMA disabled). `top' just crashes for me (all the time
with 2.1.96).

- This unresponsiveness may be why the `rm' takes so long, in the
cases that the `dd' size is smaller than "buffers".

- Doing the above but to different files on different partitions
successively gives similar timings. So it would appear that
either `rm' is properly un-dirtying the file data buffers, or
the data is getting written to disk during the slow `rm' time.
If I (now) understand the code though, the buffers are simply
freed if they haven't got to disk already.

So if my guessing is correct, the major slowdowns are:

- kswapd getting busy.

- "buffers" being unwilling to change size in this situation.
Which is probably a good thing -- writing sequentially is
obviously faster than searching for other data to page out.

And a story.

The first time, I forgot `count=1'. After 2 minutes of disk activity, I
hit control-C, thinking the 27000k must be written by now. Then I
looked at the file and was momentarily surprised to see a file about
720000000 bytes long... (720 meg folks). Gee, this thing is faster than
I thought. (2.1.96 on a K6/233 w/Quantum Fireball ST 6.4MB disk, DMA
_turned off_ because it causes occasional system freezes (still
investigating...). I'd expect approx. twice the data throughput with
DMA on, certainly that is the case for reading).

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu