Re: [patch 0/6] Guest page hinting version 7.

From: Martin Schwidefsky
Date: Mon Mar 30 2009 - 12:34:23 EST


On Mon, 30 Mar 2009 08:54:55 -0700
Dave Hansen <dave@xxxxxxxxxxxxxxxxxx> wrote:

> On Sun, 2009-03-29 at 16:12 +0200, Martin Schwidefsky wrote:
> > > Can we persuade the hypervisor to tell us which pages it decided to page
> > > out and just skip those when we're scanning the LRU?
> >
> > One principle of the whole approach is that the hypervisor does not
> > call into an otherwise idle guest. The cost of schedulung the virtual
> > cpu is just too high. So we would a means to store the information where
> > the guest can pick it up when it happens to do LRU. I don't think that
> > this will work out.
>
> I didn't mean for it to actively notify the guest. Perhaps, as Rik
> said, have a bitmap where the host can set or clear bit for the guest to
> see.

Yes, agreed.

> As the guest is scanning the LRU, it checks the structure (or makes an
> hcall or whatever) and sees that the hypervisor has already taken care
> of the page. It skips these pages in the first round of scanning.

As long as we make this optional I'm fine with it. On s390 with the
current implementation that translates to an ESSA call. Which is not
exactly inexpensive, we are talking about > 100 cycles. The better
solution for us is to age the page with the standard active/inactive
processing.

> I do see what you're saying about this saving the page-*out* operation
> on the hypervisor side. It can simply toss out pages instead of paging
> them itself. That's a pretty advanced optimization, though. What would
> this code look like if we didn't optimize to that level?

Why? It is just a simple test in the hosts LRU scan. If the page is at
the end of the inactive list AND has the volatile state then don't
bother with writeback, just throw it away. This is the only place where
the host has to check for the page state.

> It also occurs to me that the hypervisor could be doing a lot of this
> internally. This whole scheme is about telling the hypervisor about
> pages that we (the kernel) know we can regenerate. The hypervisor
> should know a lot of that information, too. We ask it to populate a
> page with stuff from virtual I/O devices or write a page out to those
> devices. The page remains volatile until something from the guest
> writes to it. The hypervisor could keep a record of how to recreate the
> page as long as it remains volatile and clean.

Unfortunately it is not that simple. There are quite a few reasons why
a page has to be made stable. You'd have to pass that information back
and forth between the guest and the host otherwise the host will throw
away e.g. an mlocked page because it is still marked as volatile in the
virtual block device.

> That wouldn't cover things like page cache from network filesystems,
> though.

Yes, there are pages with a backing the host knows nothing about.

> This patch does look like the full monty but I have to wonder what other
> partial approaches are out there.

I am open for suggestions. The simples partial approach is already
implemented for s390: unused/stable transitions in the buddy allocator.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/