Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

From: Avi Kivity
Date: Mon Mar 15 2010 - 05:28:19 EST

Next message: Justin Piszcz: "2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>]assfail+0x1b/0x20 SS:ESP 0068:f687bf14"
Previous message: Michael S. Tsirkin: "Re: [PATCH v1 2/3] Provides multiple submits and asynchronousnotifications."
In reply to: Balbir Singh: "Re: [PATCH][RF C/T/D] Unmapped page cache control - via bootparameter"
Next in thread: Balbir Singh: "Re: [PATCH][RF C/T/D] Unmapped page cache control - via bootparameter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/15/2010 11:17 AM, Balbir Singh wrote:

* Avi Kivity<avi@xxxxxxxxxx> [2010-03-15 10:27:45]:

On 03/15/2010 10:07 AM, Balbir Singh wrote:

* Avi Kivity<avi@xxxxxxxxxx> [2010-03-15 09:48:05]:

On 03/15/2010 09:22 AM, Balbir Singh wrote:

Selectively control Unmapped Page Cache (nospam version)

From: Balbir Singh<balbir@xxxxxxxxxxxxxxxxxx>

This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache!=none, we see
double caching - (one in the host and one in the guest). As
we try to scale guests, cache usage across the system grows.
The goal of this patch is to reclaim page cache when Linux is running
as a guest and get the host to hold the page cache and manage it.
There might be temporary duplication, but in the long run, memory
in the guests would be used for mapped pages.

Well, for a guest, host page cache is a lot slower than guest page cache.

Yes, it is a virtio call away, but is the cost of paying twice in
terms of memory acceptable?

Usually, it isn't, which is why I recommend cache=off.

cache=off works for *direct I/O* supported filesystems and my concern is that
one of the side-effects is that idle VM's can consume a lot of memory
(assuming all the memory is available to them). As the number of VM's
grow, they could cache a whole lot of memory. In my experiments I
found that the total amount of memory cached far exceeded the mapped
ratio by a large amount when we had idle VM's. The philosophy of this
patch is to move the caching to the _host_ and let the host maintain
the cache instead of the guest.

That's only beneficial if the cache is shared. Otherwise, you could use the balloon to evict cache when memory is tight.

Shared cache is mostly a desktop thing where users run similar workloads. For servers, it's much less likely. So a modified-guest doesn't help a lot here.

One of the reasons I created a boot
parameter was to deal with selective enablement for cases where
memory is the most important resource being managed.

I do see a hit in performance with my results (please see the data
below), but the savings are quite large. The other solution mentioned
in the TODOs is to have the balloon driver invoke this path. The
sysctl also allows the guest to tune the amount of unmapped page cache
if needed.

The knobs are for

1. Selective enablement
2. Selective control of the % of unmapped pages

An alternative path is to enable KSM for page cache. Then we have
direct read-only guest access to host page cache, without any guest
modifications required. That will be pretty difficult to achieve
though - will need a readonly bit in the page cache radix tree, and
teach all paths to honour it.

Yes, it is, I've taken a quick look. I am not sure if de-duplication
would be the best approach, may be dropping the page in the page cache
might be a good first step. Data consistency would be much easier to
maintain that way, as long as the guest is not writing frequently to
that page, we don't need the page cache in the host.

Trimming the host page cache should happen automatically under pressure. Since the page is cached by the guest, it won't be re-read, so the host page is not frequently used and then dropped.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Justin Piszcz: "2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>]assfail+0x1b/0x20 SS:ESP 0068:f687bf14"
Previous message: Michael S. Tsirkin: "Re: [PATCH v1 2/3] Provides multiple submits and asynchronousnotifications."
In reply to: Balbir Singh: "Re: [PATCH][RF C/T/D] Unmapped page cache control - via bootparameter"
Next in thread: Balbir Singh: "Re: [PATCH][RF C/T/D] Unmapped page cache control - via bootparameter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]