Re: Soft-Updates for Linux ?

From: Rik van Riel (riel@conectiva.com.br)
Date: Sun Oct 01 2000 - 20:48:37 EST


On Mon, 2 Oct 2000, Daniel Phillips wrote:
> Rik van Riel wrote:
> > On Sun, 1 Oct 2000, bert hubert wrote:
> > >
> > > If Rik gets some kind of memory pressure callback API in the
> > > kernel, there is no theoretical reasons why the journalling
> > > filesystems couldn't be merged safely.
> >
> > Once the VM is stable with the current feature set and OOM
> > handling has been added, I'll probably look at the support
> > code for the journaling filesystems.
>
> What I've seen proposed is a mechanism where the VM can say
> 'flush this page' to a filesystem and the filesystem can then go
> ahead and do what it wants, including flushing the page,
> flushing some other page, or not doing anything and just being
> part of the problem. I'm having trouble seeing that as a clear
> way for the VM to communicate what it wants. Why is it
> interested in *that* page in particular?

Because VM is about page replacement.

In order to keep system performance good, you want to evict
the least frequently+recently used page.

However, if that page is unfreeable because of write ordering
constraints, we have to ask the filesystem if it could do
something to /make/ that page flushable (and freeable).

> What should we read into its interest in a particular page?

VM: "I want to evict this data from memory and use the page
     for something else."

> If that page can't be written just now, this mechanism provides
> no help in finding a suitable substitute.

That's not the point. The filesystem might have to /flush/
a few pages in order to make this one freeable, but the
data in those other pages won't be removed from memory by
the VM subsystem.

All the VM subsystem does is free up /this/ page. The other
ones are left alone by the VM subsystem.

> Instead of calling the filesystem repeatedly, how about just
> telling it the bad news: the amount of cache the VM thinks the
> filesystem should be using for a particular superblock, and
> perhaps the amount the VM thinks the filesystem is using now.

NOOOOOOOOOOOOOOOOO!!!!!

If you want the MM and the FS to need intimate knowledge
about each other I'm out of here...

> Passing in the target page seems to be an attempt at
> communicating some of the LRU information that the VM maintains.
> If so, this is a very low-bandwidth way of doing that.

The overhead in doing this is absolutely negligable compared
to the cost of actually writing something to disk. Also, you
really don't want to have knowledge about the write ordering
constraints in the MM subsystem.

> I have to admit I haven't studied the VM the way I should, I'm
> still trying to preserve the fiction that one can write a
> filesystem without getting involved in the memory manager.

I'm trying to make that reality.

All the filesystem needs to know is this:
1) internal write ordering constraints
2) if the VM asks the FS to flush out a page, we
   must be able to deal with that request
   (but you need this functionality anyway, for fsync)
3) the FS has the freedom to do its own optimisations when
   the VM asks something, since VM calls are advisory

All the VM layer needs to know:
1) the VM layer can free clean pages
2) don't mess around with filesystem pages but ask
   the filesystem instead
3) don't rely on the filesystem actually doing what
   we ask ... leave the filesystem the freedom to do
   what it needs to do

Where in this idea is there any crossover between filesystems
and the VM?

(or do you really think that the current system of buffercache
black magic is better?)

> But here is a half-baked idea: how about exporting the page aging
> mechanism so a filesystem can age its own pages.

Rubbish. Absolute crap.

In order for page replacement to work well, you want
to do page replacement on a global (system-wide) level.
Your proposal essentially moves page replacement to the
local level, which completely breaks even in an example
as simple as this:

Suppose you have 2 different filesystems and a bunch
of applications in your system. What you want to do
in this case is evict the least recently+frequently
used page from the memory.

Your idea would cause the system to evict the least
recently+frequently used page from each subsystem.
This just doesn't work if one filesystem is busy while
another one isn't.

Example:
- you untar some source code from CD
- then you start compiling

The compiler will allocate memory, and cause the system
to evict little bits of memory from the iso9660 cache,
the ext2 cache and from running programs.

This is close to the absolute worst thing you can do.

What you want is for the iso9660 fs cache of the tarball
to be evicted, since that isn't going to be used again
(and hasn't been used for the longest time).

Local mechanisms simply CANNOT make page replacement work
well on a system-wide level.

regards,

Rik

--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:08 EST