Hi Izik,
Sorry, I've not yet replied to your response of 1 July, nor shall I
right now. Instead, more urgent to send you my current KSM rollup,
against 2.6.31-rc2, with which I'm now pretty happy - to the extent
that I've put my signoff to it below.
Though of course it's actually your and Andrea's and Chris's work,
just played around with by me; I don't know what the order of
signoffs should be in the end.
What it mainly lacks is a Documentation file, and more statistics in
sysfs: though we can already see how much is being merged, we don't
see any comparison against how much isn't.
But if you still like the patch below, let's advance to splitting
it up and getting it into mmotm: I have some opinions on the splitup,
I'll make some suggestions on that tomorrow.
You asked for a full diff against -rc2, but may want some explanation
of differences from what I sent before. The main changes are:-
A reliable PageKsm(), not dependent on the nature of the vma it's in:
it's like PageAnon, but with NULL anon_vma - needs a couple of slight
adjustments outside ksm.c.
Consequently, no reason to go on prohibiting KSM on private anonymous
pages COWed from template file pages in file-backed vmas.
Most of what get_user_pages did for us was unhelpful: now rely on
find_vma and follow_page and handle_mm_fault directly, which allow
us to check VM_MERGEABLE and PageKsm ourselves where needed.
Which eliminates the separate is_present_pte checks, and spares us
from wasting rmap_items on absent ptes.
Which then drew attention to the hyperactive allocation and freeing
of tree_items, "slabinfo -AD" showing huge activity there, even when
idling. It's not much of a problem really, but might cause concern.
And revealed that really those tree_items were a waste of space, can
be packed within the rmap_items that pointed to them, while still
keeping to the nice cache-friendly 64-byte or 32-byte rmap_item.
(If another field needed later, can make rmap_list singly linked.)
mremap move issue sorted, in simplest COW-breaking way. My previous
code to unmerge according to rmap_item->stable was racy/buggy for
two reasons: ignore rmap_items there now, just scan the ptes.
ksmd used to be running at higher priority: now nice 0.
Moved mm_slot hash functions together; made hash table smaller
now it's used less frequently than it was in your design.
More cleanup, making similar things more alike.