Re: filesystem transactions API

From: Charles P. Wright
Date: Tue Apr 26 2005 - 10:48:41 EST


On Tue, 2005-04-26 at 18:22 +0400, Artem B. Bityuckiy wrote:
> Jamie Lokier wrote:
> > I think I've wanted something like that for _years_ in unix.
> >
> > It's an old, old idea, and I've often wondered why we haven't implemented it.
> >
>
> I thought it is possible to rather easily to implement this on top
> of non-transactional FS (albeit I didn't try) and there is no need
> to overcomplicate an FS. Just implement a specialized user-space
> library and utilize it.
There are actually plenty of things that make it harder than it first
seems to provide ACID transactions. The two most difficult things are
going to be atomicity and isolation.

Atomicity is difficult, because you have lots of caches each with their
own bits of state (e.g., the inode/dentry caches). Assuming your
transaction is committed that isn't so much of a problem, but once you
have on rollback you need to undo any changes to those caches.

Isolation (this is the property that says that concurrent transactions
should be the same as if there was a serial execution) is also tricky to
get right. A transaction can touch any number of objects, and user-
applications may not respect any lock ordering --- which means you will
have deadlocks, and you must detect and resolve them (probably by
aborting one of the transactions).

None of these problems are insurmountable, and there are definitely good
reasons to use transactions. For example, RPM uses transactions to
update its own databases, it would be great if it could use transactions
to update the whole file system. Mail servers also have to go through
hoops to provide atomic updates. Isolation takes care of race
conditions.

At our lab, we've been experimenting with transactional file systems.
We've ported the Berkeley database to the kernel, because it already
provides ACID transactions. We've also built a simple file system on
top of it, with a rudimentary transactions API that is exposed to user-
level. One of the key things that we've learned is that it isn't very
easy to just "bolt" transactions onto your file system after the fact,
because there are just so many interactions between the file system,
caches, and the transaction manager.

Charles

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/