Re: Transparent mounts

Riley Williams (rhw@MemAlpha.CX)
Thu, 18 Nov 1999 13:58:38 +0000 (GMT)


Hi Alexander, Horst, and also everybody else who responded.

First, can I state that the response has easily proven that this idea
is in demand - 170 emails on the subject all interested in it just
can't mean anything else. It's also far too many for me to reply to
individually, hence my choosing these two to respond to on behalf of
them all, with snippets from one or two others included...

*** Horst Von Brandt commented...

>> Basically, one facility I could use is that of being able to
>> mount two or more partitions on top of each other and have all
>> the files from all of the partitions available for reading. This
>> is what I call a transparent mount.

>> I don't know much about how the current mounting process works,
>> so I'm not currently in a position to try implementing this, but
>> I intend to learn enough to do so. However, I'd like to ensure
>> that there aren't any serious problems with the whole concept
>> before I start.

>> Basically, I see it initially working such that for a transparent
>> mount to succeed, BOTH of the following MUST be true:

>> 1. The partition(s) being mounted over must already be mounted
>> read-only. There is therefore no problem with where to put
>> any newly created files on such a system since such can't be
>> created in the first place.

> Many would like to mount a "live" filesystem from CD, and be
> able to "replace" files by ones on disk (i.e., / on CD, or the
> distribution; changes/updates on disk).

Many people have made the same mistake, so can I clarify the above. If
you read through it, you will note that I said...

>> Basically, I see it initially working such that for a transparent

...and when I said "initially working", I meant that the initial
implementation would have this requirement so that people (myself
included) could at least get something working to try out, and that
the ability to create, modify or delete files would come later.

>> 2. There must not be any name clashes between the contents of
>> the partition(s) already mounted at that point and that of
>> the root directory of the partition being transparently
>> mounted on top of it.

> The whole point would be to be able to selectively override files
> on read-only media, AFAIKS. Even "delete" them.

Again, this is something that would come later, once the initial
implementation was working.

>> I can see that verifying this second condition would be likely
>> to be time-consuming on some systems, so for an initial
>> implementation, I would suggest the following additional
>> condition:

>> 3. The total number of entries in the common directory, when
>> totalled from all partitions mounted thereon, must not be
>> such as to require more than one page of storage.

> That is quite limiting.

Agreed, but then, it is also for the initial implementation, so some
limitations should be expected. That is the first limitation I would
expect to lift, but an implementation that had that limitation would
still be quite useful.

> OTOH, semantics are hard to come by...

> - If I mount A, B, C; each containing file f. Which f do I see?
> Last one mounted, first one, individually selectable (how?)

> - What happens if I remount in another order later?

> - Assume the winning f above is the last one (from C). Now I
> delete f. What happens? Does f go away, do I see B's f? What
> if I unmount and then remount, does the change stick?

> AFAIKS, to do this cleanly would mean some kind of persistent
> overlay that keeps track of things, in addition to the
> filesystems themselves. The overlay keeps track of
> existing/nonexisting files, and points at them. But if you look
> a bit closer, this is almost the same as just populating a
> directory with symlinks to the real files elsewhere...

That is precicely why the above limitations on the INITIAL
implementation were suggested - they imply that these problems can be
left until later when an initial implementation is in place and can be
tried out.

*** Alexander Viro commented...

>>> Basically, one facility I could use is that of being able to
>>> mount two or more partitions on top of each other and have all
>>> the files from all of the partitions available for reading.
>>> This is what I call a transparent mount.

>> It sounds a lot like (BSD's?) union mount.

I have to admit that I hadn't come across that, but on reading it,
I'll definately agree with that...

>>> 2. There must not be any name clashes between the contents of
>>> the partition(s) already mounted at that point and that of
>>> the root directory of the partition being transparently
>>> mounted on top of it.

> You'll get some problems with readdir() on root, but that's
> doable.

I'll certainly watch out for that.

> If you are going to do it - do it fast, or I'll do it myself for
> final procfs split.

Go ahead. I'm not in a position to do it myself at the moment, as I've
far too much reading up to do before I'm ready to start. You will note
that I said as much in my initial post.

>> Under union-mount, the upper layer holds changes to the lower
>> layer, so conflict behavior is well defined: The upper layer
>> wins.

I'm not so certain that this behaviour is as well defined as is
claimed, and certainly I can see limitations with the behaviour as
described. See later...

>> (I'm not sure about a delete of a file that exists on both
>> layers. Is a 'deleted' entry stored on the upper FS layer?
>> Hmmm.)

> Yes.

What if the "upper FS layer" changes?

>>> I can see that verifying this second condition would be
>>> likely to be time-consuming on some systems, so for an
>>> initial implementation, I would suggest the following
>>> additional condition:

>>> 3. The total number of entries in the common directory,
>>> when totalled from all partitions mounted thereon,
>>> must not be such as to require more than one page of
>>> storage.

> Uuurgh for #3 and to some degree for #2 ;-) Check how BSD folks
> had done it.

I have had a look, and I can see limitations with their method,
several of which I would be unhappy with.

> R/O is also overkill if you are doing union-mount.

Not for an INITIAL implementation. As far as I can see, EVERY file
system implemented in Linux has first been implemented as a R/O file
system, and only later has R/W ability been added.

> Seriously, take the Daemon Book and check how it's done.

When you say "the Daemon book", which specific book are you referring
to?

*** Limitations

According to the documentation I've found or been pointed to regarding
the union file system, it includes the following limitations for it to
remain reliable:

1. All partitions involved must be mounted at the same time.

Why? What is wrong with the idea of adding new partitions
to an existing set? Certainly, for the purpose that I'm
interested in it, this requirement is a killer.

2. All partitions involved must be mounted in the same order
each time.

Again, why? OK, I can see that it's an easy way of dealing
with which partition to put changes on, but it's also an
extremely limiting one.

3. Although not explicitly stated, the examples shown all
imply that all partitions involved must be of the same
type.

I would certainly hope this is NOT the case.

Personally, I would be happier to see the following mounting semantics
used for dealing with this:

1. mount -o ro /dev/cdrom /mnt

Mounts an initial partition read-only as a basis for later
additions, or as a stand-alone mount.

2. mount -o ro,trans /dev/cdrw /mnt

Mounts a second or later partition transparently on top of
the existing ones, also mounted read-only.

3. mount -o rw,trans /dev/hda7 /mnt

Mounts a second or later partition transparently on top of
the existing ones, mounted read-write and labelled as the
one to be used for noting any changes.

The following restrictions could then be reasonably implemented:

1. The bottom partition MUST be read-only. Given these semantics,
this is not a problem.

2. In any stack of transparent mounts, AT MOST ONE may be
marked read-write. Any attempt to mount a second one as
read-write MUST fail. However, there is no problem with
a stack having no read-write partitions in it.

Note that whilst I've used "trans" as the option to specify this
behaviour, I'm not fussed as to its spelling - "wibble" would serve
just as well.

Best wishes from Riley.

PS: The kernel versions page is now back online at the URL below, and
includes separate sublists both for each kernel series, and for
each year of development.

+----------------------------------------------------------------------+
| There is something frustrating about the quality and speed of Linux |
| development, ie., the quality is too high and the speed is too high, |
| in other words, I can implement this XXXX feature, but I bet someone |
| else has already done so and is just about to release their patch. |
+----------------------------------------------------------------------+
* http://www.memalpha.cx/Linux/Kernel/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/