Re: Soft metadata updates paper w/code

Theodore Y. Ts'o (tytso@MIT.EDU)
Thu, 24 Jul 1997 08:22:31 -0400


Date: Thu, 24 Jul 1997 10:37:05 +0200 (MET DST)
From: Ingo Molnar <mingo@pc7537.hil.siemens.at>

> write succeeds, so I assume that during the duration of the write,
> access to that disk block is locked out. This could be a contention
> issue for heavily accessed directories (like /tmp) or block bitmaps.

it basically double-buffers changes. There is the 'main copy', which is
used for disk-IO, and there is the 'outstanding modifications graph',
which is in most cases a simple one-entry structure. Two outstanding
modifications can be 'coalesced', the latter one superceding the first
one.

Umm... no, that's not right. Changes are are made twice, first to the
"main copy", and then to the "outstanding modifications graph", which
contains enough information to roll modifications _forwawrd_ and
_backward_.

You have to make changes in both the main copy and the dependency
structures; think about it. Future references to the meta data *either*
need to get the up-to-date information in the main copy, *OR* the
code which references the metatdata would have to first search the
"outstanding modifications" structure, and then search the main copy.
This is both (a) complex and (b) slow.

The paper quite explicitly states that the changes in the main copy are
*undone*, written to disk, and then *redone*. This way, in-memory
references to the filesystem get all of the latest changes, but what is
written to disk may noto have the latest changes, in order to preserve
metadata consistency.

the block device interface does not have to know about this method at all,
it has to kick a 'block has finished' handler, which is does already,
sortof. If done right, this interface could be built
filesystem-independent, although i guess the first implementation will be
ext2fs based?

The block device also needs to call back to filesystem layer to do the
undo operation, and then at the completion of the disk write to do the
redo operation.

Alternatively, you can double buffer things by copying the block to
scratch space and then doing the undo operations, but then you have to
pay the cost of the copy, and the buffer layer needs to know that it
should be writing the block from the scratch space instead of the main
memory copy.

Finally, it can't be filesystem-independent because the code to do the
undo and redo operations inherently must be filesysten-dependent,
because it depends on the metadata structure.

- Ted