Re: safe file systems

Alan Cox (alan@lxorguk.ukuu.org.uk)
Wed, 24 Sep 1997 18:42:04 +0100 (BST)


> Do you think it would be possible to build a safe, slow file system?
> By safe, I mean that I could hit reset in the middle of 50 parallel
> un-tars and reboot the system and the file system comes up clean (no fsck,
> but data loss)?

I dont think you can do it without an fsck. I think you can do it with
a minimal "always works" fsck done occasionally only.

Assume you write your minixfs or ext2fs such that you do

Create/Write
if need be allocate inode, write it as blank
allocate blocks
write block allocation table update
write blocks to inode
sync-point
write data to blocks
write file extent info
sync-point
write file length field + inode data + mark inode not deleted
add to directory if needed (same rules as the above)

Delete
Unlink directory entry
sync-point
Write inode as deleted
sync-point
Write blocks as free

Then its slow to write but you know that the fail cases are simple -
if its in a directory its valid, has an inode and data. If its not in a
directory then it is free but might be missing from inode/data/extent maps.
That means your fsck is really a free space hoover.

> Has anyone thought about this very much? If so, is there a mailing list or
> archive that I can browse?

I have been - its important for embedded Linux boxes. Im stuffed right now
because I have no real way of forcing that down to disk order of writes. In
fact if my tests are right then IDE drives are themselves re-ordering my I/O
requests sometimes.

The other problem is turning off an IDE drive during a write can create
permanent bad blocks. (Take an old 40Mb drive and yank its power a few times)
so an fsck or cleanup has to do some kind of remap around those.

Flash is an option for avoiding some of the problems, and flash writes done
synchronously are cheaper than hard disk, but flash is expensive and wears
out.

Alan