Re: Soft-Updates for Linux ?

From: Andreas Dilger (adilger@turbolinux.com)
Date: Mon Oct 02 2000 - 11:54:02 EST


Albert Cahalan write:
> Robert Redelmeier writes:
> > Daniel Phillips wrote in part:
>
> >> One thing to keep in mind in all of this is: nobody is testing the
> >> reliability of their journalling or any other kind of filesystem just by
> >> running it. To test these things you have to crash/interrupt the system
> >> *a lot*. Otherwise, you are just fooling yourself and everybody else.
> >> How many crashes does it take to find that one little window of
> >> vulnerability that comes up every 10,000 crashes normally but suddenly
> >> starts coming up every time just because your customer uses their system
> >> a different way? You're doing the right thing by crash-testing it, now
> >> instead of doing it 5 times do it 1,000 times. Here's one of my
> >> favorite tests: unzip a kernel source tree and wait until the disk light
> >> goes out. A second or so after it comes on again (kflushd) hit the
> >> reset button.
> >
> > Good idea. I certainly believe in stressing hardware (see .sig),
> > but I'm not sure this test is rigorous enough. The problem is
> > the reset button is only connected to the CPU and the hard disk
> > will probably continue to write out sectors from it's hw buffer.
> > OTOH, I don't like the idea of pulling the plug too often. It's
> > very hard on the hardware. I'd expect a mechanical disk failure
> > before 10,000 cycles.
>
> The nice way to develop this code is with a block device that
> discards all writes after a timer goes off.

I made a patch to the loopback device which allows you to discard I/Os
going to disk. You can either activate it via an ioctl from user space,
or via a function call in the kernel.

You can also make reads fail, but this was not very useful for me, because
it caused the ASSERTs in ext3 to oops. Also the read "failures" are not
the same as the real thing, so it may not be a valid test. They only
return a zero'd page, rather than really causing a non-up-to-date page.

I used it quite a bit when developing the orphan code for ext3, and for
testing journal integration in InterMezzo. You can use it for testing
a loopback file, or loopback mount a block device, but as with regular
loopback devices, there is a 2GB limit.

I posted it to fsdevel a few months ago, but I have also uploaded it to:
ftp://ftp.stelias.com/pub/adilger/loopdiscard-2.2.16.patch
ftp://ftp.stelias.com/pub/adilger/loop_discard.c

The loop_discard.c program simply calls the ioctl to enable or disable
I/O on the specific loop device. Unconfiguring the loop device also
resets the I/O status.

Cheers, Andreas

-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:10 EST