Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio

From: David Hildenbrand
Date: Mon Dec 18 2023 - 09:07:48 EST

Next message: Masahiro Yamada: "Re: [PATCH v6 1/2] depmod: Handle installing modules under a different directory"
Previous message: Johannes Weiner: "Re: [PATCH 5/5] mm/zswap: cleanup zswap_reclaim_entry()"
In reply to: Nadav Amit: "Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio"
Next in thread: Nadav Amit: "Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 17.12.23 20:13, Nadav Amit wrote:

On Nov 24, 2023, at 3:26 PM, David Hildenbrand <david@xxxxxxxxxx> wrote:

5. sub-IDs
==========

To achieve (2), we generate sub-IDs that have the following property,
assuming that our folio has P=folio_nr_pages() pages.
"2 * sub-ID" cannot be represented by the sum of any other *2* sub-IDs
"3 * sub-ID" cannot be represented by the sum of any other *3* sub-IDs
"4 * sub-ID" cannot be represented by the sum of any other *4* sub-IDs
...
"P * sub-ID" cannot be represented by the sum of any other *P* sub-IDs

The sub-IDs are generated in generations, whereby
(1) Generation #0 is the number 0
(2) Generation #N takes all numbers from generations #0..#N-1 and adds
(P + 1)^(N - 1), effectively doubling the number of sub-IDs

Consequently, the smallest number S in gen #N is:
S[#N] = (P + 1)^(N - 1)

The largest number L in gen #N is:
L[#N] = (P + 1)^(N - 1) + (P + 1)^(N - 2) + ... (P + 1)^0 + 0.
-> [geometric sum with "P + 1 != 1"]
= (1 - (P + 1)^N) / (1 - (P + 1))
= (1 - (P + 1)^N) / (-P)
= ((P + 1)^N - 1) / P

David, as you know it took me a while to understand your impressive work.

Hi Nadav,

thanks a bunch for digging into the details of the solution and trying to verify the correctness!

And thanks a lot for highlighting the special case below that I intuitively used for the "intuition" :)

I think that part of what made it hard for me is the presentation and the
formulation of sub-IDs in arithmetic means instead of bit manipulations.

Yes, that needs more work to make it all easier to understand.

Basically, IIUC, you want for order-K pages to add K-1 “0” bits between
each rmap-id bits.

In this case, in x86-64 (with BMI2) there is the PDEP instruction that can
generate these values rather easily with little overhead.

Partially, yes.

What you describe here is a special case that is easier to "grasp", likely a bit faster to calculate, but ends up being less-optimal in space consumption for some cases (especially, order-2 and order-9 folios).

You are describing the special case where "P+1" is a power of two.

Let me explain using the example (P = 3) from the "intuition" further down in this patch description, highlighting the mapping:

RMAP-ID | |Subid |
-----------------------------------
0 | 0000 | 0 | 0000 0000
1 | 0001 | 1 | 0000 0001
2 | 0010 | 4 | 0000 0100
3 | 0011 | 5 | 0000 0101
4 | 0100 | 16 | 0001 0000
5 | 0101 | 17 | 0001 0001
6 | 0110 | 20 | 0001 0100
7 | 0111 | 21 | 0001 0101
8 | 1000 | 64 | 0100 0000
9 | 1001 | 65 | 0100 0001
10 | 1010 | 68 | 0100 0100
11 | 1011 | 69 | 0100 0101
12 | 1100 | 80 | 0101 0100
13 | 1101 | 81 | 0101 0001
14 | 1110 | 84 | 0101 0100
15 | 1111 | 85 | 0101 0101

So going from e.g., 11 -> 69 means adding one zero for each bit, just like you said.

But adding 1 "0" bit is not sufficient for handling order-2 folios (P = 4), only for handling order-1 folios. So what the current approach does is the following (P = 4):

RMAP-ID | | Subid |
-----------------------------------
0 | 0000 | 0 | 0000 0000
1 | 0001 | 1 | 0000 0001
2 | 0010 | 5 | 0000 0101
3 | 0011 | 6 | 0000 0110
4 | 0100 | 25 | 0001 1001
5 | 0101 | 26 | 0001 1010
6 | 0110 | 30 | 0001 1110
7 | 0111 | 31 | 0001 1111
8 | 1000 | 125 | 0111 1101
9 | 1001 | 126 | 0111 1110
10 | 1010 | 130 | 1000 0010
11 | 1011 | 131 | 1000 0011
12 | 1100 | 150 | 1001 0110
13 | 1101 | 151 | 1001 0111
14 | 1110 | 155 | 1001 1011
15 | 1111 | 156 | 1001 1100

Which is not just adding "0"s.

To handle it using your simplification we'd have to add one more "0" bit to be able to use it for order-2 folios. So I'll refine your statement to:
for order-K pages to add K “0” bits between each rmap-id bits.

Then, it's easy to see how going from 24 RMAP-ID bits for an order-2 page would require with that simplification 3*24 = 72bit and would no longer fit into a single 64bit value.

To summarize, with the optimized (currently implemented) scheme, we achieve:
* == order-2: 1x 64bit rmap values
* <= order-5: 2x 64bit rmap values
* <= order-9: 3x 64bit rmap values
* <= order-10: 4x 64bit rmap values
* <= order-12: 5x 64bit rmap values
* <= order-15: 6x 64bit rmap values
* ...

I think, going with the simplification above (to be double-checked), we'd achieve [essentially, subtracting 1 from all orders]:
* == order-1: 1x 64bit rmap values
* <= order-4: 2x 64bit rmap values
* <= order-8: 3x 64bit rmap values
* <= order-9: 4x 64bit rmap values
* <= order-11: 5x 64bit rmap values
* <= order-14: 6x 64bit rmap values
* ...

So especially for order-9 folios we would require 4 instead of 3 64bit values.

Now, going with this simplification as first step makes absolute sense, because it's much more intuitive and eventually a bit easier to implement (although really not significantly IMHO).

One can then easily add the optimization on top later.

I think that besides the easier generation of sub-ids values in this
manner, discussing the matter without the “generations" also makes it
easier to understand the correctness (at least for me).

Yes, that presentation certainly needs improvement. Generating the magic numbers recursively makes it easier to proof the correctness using induction, that's how I started to use that whole "generation" terminology.

--
Cheers,

David / dhildenb

Next message: Masahiro Yamada: "Re: [PATCH v6 1/2] depmod: Handle installing modules under a different directory"
Previous message: Johannes Weiner: "Re: [PATCH 5/5] mm/zswap: cleanup zswap_reclaim_entry()"
In reply to: Nadav Amit: "Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio"
Next in thread: Nadav Amit: "Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]