Re: [2/3] POHMELFS: Documentation.

From: Sage Weil
Date: Sun Jun 15 2008 - 12:41:58 EST


On Sun, 15 Jun 2008, Evgeniy Polyakov wrote:
> Yes, not only writepage, but any request - if it sends sequest and then
> receives reply (i.e. doing send/recv sequence without ability to do
> something else in between or allow other users to do sends or receives
> into the same socket), then it is synchronous. If it only sends, and
> someone else receives, it is possible to send multiple requests from
> different users who do reads or writes or lookups or whatever and
> asynchronously in different thread receive replies not in particular
> order, so this approach I call asynchronous.

Oh, so you just mean that the caller doesn't, say, hold a mutex for the
socket for the duration of the send _and_ recv? I'm kind of shocked that
anyone does that, although I suppose in some cases the protocol
effectively demands it.

> Yes, POHMELFS does writing that way.

Nice. I will definitely be taking a look at that.

> Not exactly. Transaction in a nutshell is a wrapper on top of command
> (or multiple commands if needed like in writing), which contains all
> information needed to perform appropriate action. When user calls read()
> or 'ls' or write() or whatever, POHMELFS creates transaction for that
> operation and tries to perform it (if operation is not cached, in that
> case nothing actually happens). When transaction is submitted, it
> becomes part of the failover state machine which will check if data has
> to be read from different server or written to new one or dropped.
> original caller may not even know from which server its data will be
> received. If request sending failed in the middle, the whole transaction
> will be redirected to new one. It is also possible to redo transaction
> against different server, if server sent us error (like I'm busy), but
> this functionality was dropped in previous release iirc, this can be
> resurrected though. Having generic transaction tree callers do not
> bother about how to store theirs requests, how to wait for results and
> how to complete them - transactions do it for them. It is not rocket
> science, but extrmely effective and simple way to help rule out
> asynchronous machinery.

Got it. Tracking pending requests in some generic way is definitely key
to making failure handling sane with multiple servers.

> That was somewhat old approach, currently inode numbers and things like
> open-by-inode or NFS style open-by-cookie are not used. I tried to
> describe caching bits in docuementation I ent, although its a bit rough
> and likely incomplete :) Feel free to ask if there are some white areas
> there.

So what happens if the user creates a new file, and then does a stat() to
expose i_ino. Does that value change later? It's not just
open-by-inode/cookie that make ino important.

It looks like the client/server protocol is primarily path-based. What
happens if you do something like

hosta$ cd foo
hosta$ touch foo.txt
hostb$ mv foo bar
hosta$ rm foo.txt

Will hosta realize it really needs to do "unlink /bar/foo.txt"?

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/