Re: Pull request for FS-Cache, including NFS patches

From: Trond Myklebust
Date: Fri Dec 19 2008 - 07:35:24 EST


On Thu, 2008-12-18 at 18:44 -0800, Andrew Morton wrote:
> On Fri, 19 Dec 2008 02:27:24 +0000 David Howells <dhowells@xxxxxxxxxx> wrote:
>
> > > Are any distros pushing for this? Or shipping it? If so, are they
> > > able to weigh in and help us with this quite difficult decision?
> >
> > We (Red Hat) have shipped it in RHEL-5 and some Fedora releases. Doing so is
> > quite an effort, though, precisely because the code is not yet upstream. We
> > have customers using it and are gaining more customers who want it. There
> > even appear to be CentOS users using it (or at least complaining when it
> > breaks).
>
> That's useful news.
>
> >
> > I don't know what will convince you. I've given you theoretical reasons why
> > caching ought to be useful; I've backed up the ones I've implemented with
> > benchmarks; I've given you examples of what our customers are doing with it or
> > want to do with it.
>
> Was that information captured/maintained somewhere? It really is important (I
> think) for something of this magnitude.
>
> > Please help me understand what else you want.
>
> I want to be able to have an answer when someone asks me "why was all that
> stuff merged". One which I can believe.
>
> > Do you perhaps want the netfs maintainers (such as Trond) to say that it's
> > necessary?
>
> Of course, their opinions (and supporting explanations) would be valuable.

OK. I do agree that persistent caches can be (sometimes very!) useful
for certain workloads. As far as I'm concerned, it is mainly a tool to
help scale up the number of clients per server.

One interesting use case that I didn't see David mention is for cluster
boot up. In a lot of the HPC clustered set-ups there tend be a number of
'hot' files that all clients need to access at roughly the same time in
the boot cycle. Pre-loading these files into the persistent cache before
booting the cluster is one way to solve this problem. Server replication
and/or copying the files to local storage on the clients are other
solutions.
The advantage the persistent cache confers over the two other solutions
is that it simplifies the data management: if you do need to change one
or two of those hot files on the server, then the clients will
automatically update their caches (both the page cache and the
persistent cache) without requiring any further action from the cluster
administrator. The disadvantage is that cachefs doesn't yet appear to
have a tool to select those files that are hot and are therefore best
suited to cache (please correct me if I'm wrong, David). The fact that a
file is 'hot' on some given client is not necessarily equivalent to
saying that it is hot on the server and vice versa. Are there any plans
to at some point introduce a tool to manage the persistent cache?

Cheers
Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/