Re: [PATCH] cowlinks v2

From: Eric W. Biederman
Date: Mon Mar 29 2004 - 14:38:41 EST


Jamie Lokier <jamie@xxxxxxxxxxxxx> writes:

> Eric W. Biederman wrote:
> > The file will become a cow file only after it is modified or it's
> > containing directory is modified.
>
> Eh? The file (or directory) must be labelled as a cowlinked file the
> moment you make the cowlink, not when the data is modified. It's
> _breaking_ the cowlink that happens when the data (or directory
> contents) are modified.

We have the mechanism of:
> You cowlinked a directory. That converts the original directory inode
> to a cowlink, creates another cowlink, and creates a shared inode
> which now contains the directory.
>
> Then you modify the directory or anything below it. That duplicates
> the directory, breaking the directory cowlinks and duplicating the
> shared directory inode -- so that the two directory cowlink inodes
> become normal directory inodes. The directory duplication results in
> two directory which are full of cowlinks -- every object in the
> original directory is cowlinked by this operation.

This mechanism does not label all files below a directory as cowlinked
the moment the cowlink.

Therefore cowlink on a directory as described may be an interesting
operation but it is not an atomic snapshot of the directory and
it's contents.

> > Thus you can have data in the
> > file that was written after the snapshot operation finished, but
> > before the individual file itself is marked cow.
>
> The creation of a cowlink should be atomic w.r.t. writing.
>
> Specifically, the operation which moves the contents of a non-cowlink
> inode to a newly created shared inode, and converts the original
> non-cowlink inode to a cowlink inode, should be atomic.
>
> Is there an unavoidable race condition? I don't see one.

Only if the following apply.

- A correct cow on a directory is considered to be an atomic snapshot
of both the directory and everything below it.
- You break the cow on the directories on demand, as above.

Scenario. The directory tree looks like:

/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2)/link2 (inode 10)

Then a cow link is performed:

ln --cow dir2 cow1

Resulting in a directory tree that looks like:
/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2) -> (inode 3)/link2 (inode 10)
/cow1 (inode 4) -> (inode 3)/link2 (inode 10)


Then we have several things that could happen.
Scenario A: fd = open(/cow1/link2); write(fd, ....);
Scenario B: fd = open(/dir2/link2); write(fd, ....);
Scenario C: fd = open(/dir1/link1); write(fd, ....);

**
In Scenario A the open breaks the cow on the directory, so that
the cow copy can have it's own inode as:

/dir1 (inode 1)/link1 (inode 10) -> (inode 11)
/dir2 (inode 2)/link2 (inode 10) -> (inode 11)
/cow1 (inode 3)/link2 (inode 12) -> (inode 11)

Then the write occurs the cow is broken and the tree looks like:

/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2)/link2 (inode 10) -> (inode 11)
/cow1 (inode 3)/link2 (inode 12)

**
In Scenario B the open breaks the cow on the directory, so that
the cow copy can potentially have it's own inode as:

/dir1 (inode 1)/link1 (inode 10) -> (inode 11)
/dir2 (inode 2)/link2 (inode 10) -> (inode 11)
/cow1 (inode 3)/link2 (inode 12) -> (inode 11)

Then the write occurs the cow is broken and the tree looks like:

/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2)/link2 (inode 10)
/cow1 (inode 3)/link2 (inode 12) -> (inode 11)


**
In Scenario C their is no open for the cow to break and
things proceed normally the directory tree remains looking like:

/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2) -> (inode 3)/link2 (inode 10)
/cow1 (inode 3) -> (inode 3)/link2 (inode 10)

Then when the write occurs there is not cow to break, and the
tree remains looking like:

/dir1 (inode 1)/link1 (inode 10)
/dir2 (inode 2) -> (inode 3)/link2 (inode 10)
/cow1 (inode 3) -> (inode 3)/link2 (inode 10)

So I see a problem with Scenario C. Perhaps you can refute it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/