Re: Linux 2.6.29

From: Mark Lord
Date: Mon Mar 30 2009 - 14:39:21 EST


Chris Mason wrote:

I had some fun trying things with this, and I've been able to reliably
trigger stalls in write cache of ~60 seconds on my seagate 500GB sata
drive. The worst I saw was 214 seconds.
..

I'd be more interested in how you managed that (above),
than the quite different test you describe below.

Yes, different, I think. The test below just times how long a single
chunk of data might stay in-drive cache under constant load,
rather than how long it takes to flush the drive cache on command.

Right?

Still, useful for other stuff.

It took a little experimentation, and I had to switch to the noop
scheduler (no idea why).

Also, I had to watch vmstat closely. When the test first started,
vmstat was reporting 500kb/s or so write throughput. After the test ran
for a few minutes, vmstat jumped up to 8MB/s.

My guess is that the drive has some internal threshold for when it
decides to only write in cache. The switch to 8MB/s is when it switched
to cache only goodness. Or perhaps the attached program is buggy and
I'll end up looking silly...it was some quick coding.

The test forks two procs. One proc does 4k writes to the first 26MB of
the test file (/dev/sdb for me). These writes are O_DIRECT, and use a
block size of 4k.

The idea is that we fill the cache with work that is very beneficial to
keep in cache, but that the drive will tend to flush out because it is
filling up tracks.

The second proc O_DIRECT writes to two adjacent sectors far away from
the hot writes from the first proc, and it puts in a timestamp from just
before the write. Every second or so, this timestamp is printed to
stderr. The drive will want to keep these two sectors in cache because
we are constantly overwriting them.

(It's worth mentioning this is a destructive test. Running it
on /dev/sdb will overwrite the first 64MB of the drive!!!!)

Sample output:

# ./wb-latency /dev/sdb
Found tv 1238434622.461527
starting hot writes run
starting tester run
current time 1238435045.529751
current time 1238435046.531250
...
current time 1238435063.772456
current time 1238435064.788639
current time 1238435065.814101
current time 1238435066.847704

Right here, I pull the power cord. The box comes back up, and I run:

# ./wb-latency -c /dev/sdb
Found tv 1238435067.347829

When -c is passed, it just reads the timestamp out of the timestamp
block and exits. You compare this value with the value printed just
before you pulled the block.

For the run here, the two values are within .5s of each other. The
tester only prints the time every one second, so anything that close is
very good. I had pulled the plug before the drive got into that fast
8MB/s mode, so the drive was doing a pretty good job of fairly servicing
the cache.

My drive has a cache of 32MB. Smaller caches probably need a smaller
hot zone.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/