Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

From: Juan Piernas Canovas
Date: Thu Feb 22 2007 - 14:57:33 EST


Hi Jörn,

On Thu, 22 Feb 2007, [utf-8] Jörn Engel wrote:

A partial segment is a transaction unit, and contains "all" the blocks
modified by a file system operation, including indirect blocks and i-nodes
(actually, it contains the blocks modified by several file system
operations, but let us assume that every partial segment only contains the
blocks modified by a single file system operation).

So, the above figure is as follows in DualFS:

Before:
Segment 1: [some data] [ D0 D1 D2 I ] [more data]
Segment 2: [ some data ]
Segment 3: [ empty ]

If the datablock D0 is modified, what you get is:

Segment 1: [some data] [ garbage ] [more data]
Segment 2: [ some data ]
Segment 3: [ D0 D1 D2 I ] [ empty ]

You have fairly strict assumptions about the "Before:" picture. But

The "Before" figure is intentionally simple because it is enough to understand how the meta-data device of DualFS works, and why the cleaner is deadlock-free ;-)

what happens if those assumptions fail. To give you an example, imagine
the following small script:

$ for i in `seq 1000000`; do touch $i; done

This will create a million dentries in one directory. It will also
create a million inodes, but let us ignore those for a moment. It is
fairly unlikely that you can fit a million dentries into [D0], so you
will need more than one block. Let's call them [DA], [DB], [DC], etc.
So you have to write out the first block [DA].

Before:
Segment 1: [some data] [ DA D1 D2 I ] [more data]
Segment 2: [ some data ]
Segment 3: [ empty ]

If the datablock D0 is modified, what you get is:

Segment 1: [some data] [ garbage ] [more data]
Segment 2: [ some data ]
Segment 3: [ DA D1 D2 I ] [ empty ]

That is exactly your picture. Fine. Next you write [DB].

Before: see above
After:
Segment 1: [some data] [ garbage ] [more data]
Segment 2: [ some data ]
Segment 3: [ DA][garbage] [ DB D1 D2 I ] [ empty]

You write [DC]. Note that Segment 3 does not have enough space for
another partial segment:

Segment 1: [some data] [ garbage ] [more data]
Segment 2: [ some data ]
Segment 3: [ DA][garbage] [ DB][garbage] [wasted]
Segment 4: [ DC D1 D2 I ] [ empty ]

You write [DD] and [DE]:
Segment 1: [some data] [ garbage ] [more data]
Segment 2: [ some data ]
Segment 3: [ DA][garbage] [ DB][garbage] [wasted]
Segment 4: [ DC][garbage] [ DD][garbage] [wasted]
Segment 5: [ DE D1 D2 I ] [ empty ]

And some time later you even have to switch to a new indirect block, so
you get before:

Segment n : [ DX D1 D2 I ] [ empty ]

After:

Segment n : [ DX D1][garb] [ DY DI D2 I ] [ empty]

What you end up with after all this is quite unlike you "Before"
picture. Instead of this:

Segment 1: [some data] [ D0 D1 D2 I ] [more data]


I agree with all the above, althoug it displays the wort case because a partial segment usually contains several datablocks, and a few indirect blocks. But it is fine for our purposes.

You may have something closer to this:

Segment 1: [some data] [ D1 ] [more data]
Segment 2: [some data] [ D0 ] [more data]
Segment 3: [some data] [ D2 ] [more data]


I do not agree with this picture, because it does not show that all the indirect blocks which point to a direct block are along with it in the same segment. That figure should look like:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [some data] [ D0 D1' D2' ] [more data]
Segment 3: [some data] [ DB D1 D2 ] [more data]

where D0, DA, and DB are datablocks, D1 and D2 indirect blocks which point to the datablocks, and D1' and D2' obsolete copies of those indirect blocks. By using this figure, is is clear that if you need to move D0 to clean the segment 2, you will need only one free segment at most, and not more. You will get:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [ free ]
Segment 3: [some data] [ DB D1' D2' ] [more data]
......
Segment n: [ D0 D1 D2 ] [ empty ]

That is, D0 needs in the new segment the same space that it needs in the previous one.

The differences are subtle but important.

Regards,

Juan.

You should try the testcase and look at a dump of your filesystem
afterwards. I usually just read the raw device in a hex editor.

Jörn



--
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34968367657 Fax: +34968364151
email: piernas@xxxxxxxxxxx
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index

*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o PostScript :-) ***