Re: [PATCH v2 1/3] pagemap: add mmap-exclusive bit for marking pages mapped only here

From: Mark Williamson
Date: Tue May 12 2015 - 08:05:53 EST


Hi Konstantin,

I hope you won't mind me thinking out loud here on the idea of adding
a flag to the v2 pagemap fields... From a kernel PoV, I agree that
this seems like the cleanest approach. However, with my application
developer hat on:

1. I was hoping we'd be able to backport a compatible fix to older
kernels that might adopt the pagemap permissions change. Using the V2
format flags rules out doing this for kernels that are too old to have
soft-dirty, I think.
2. From our software's PoV, I feel it's worth noting that it doesn't
strictly fix ABI compatibility, though I realise that's probably not
your primary concern here. We'll need to modify our code to write the
clear_refs file but that change is OK for us if it's the preferred
solution.

In the patches I've been playing with, I was considering putting the
Exclusive flag in the now-unused PFN field of the pagemap entries.
Since we're specifically trying to work around for the lack of PFN
information, would there be any appetite for mirroring this flag
unconditionally into the now-empty PFN field (i.e. whether using v1 or
v2 flags) when accessed by an unprivileged process?

I realise it's ugly from a kernel PoV and I feel a little bad for
suggesting it - but it would address points 1 and 2 for us (our
existing code just looks for changes in the pagemap entry, so sticking
the flag in there would cause it to do the right thing).

I'm sorry to raise application-specific issues at this point; I
appreciate that your primary concern is to improve the kernel and
technically I like the approach that you've taken! I'll try and
provide more code-oriented feedback once I've tried out the changes.

Thanks,
Mark

On Tue, May 12, 2015 at 10:43 AM, Konstantin Khlebnikov
<khlebnikov@xxxxxxxxxxxxxx> wrote:
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
>
> present file exclusive state
> 0 0 0 non-present
> 1 1 0 file page mapped somewhere else
> 1 1 1 file page mapped only here
> 1 0 0 anon non-CoWed page (shared with parent/child)
> 1 0 1 anon CoWed page (or never forked)
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> Link: lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@xxxxxxxxxxxxxx
>
> ---
>
> v2:
> * handle transparent huge pages
> * invert bit and rename shared -> exclusive (less confusing name)
> ---
> Documentation/vm/pagemap.txt | 3 ++-
> fs/proc/task_mmu.c | 10 ++++++++++
> tools/vm/page-types.c | 12 ++++++++++++
> 3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index 6bfbc172cdb9..3cfbbb333ea1 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -16,7 +16,8 @@ There are three components to pagemap:
> * Bits 0-4 swap type if swapped
> * Bits 5-54 swap offset if swapped
> * Bit 55 pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> - * Bits 56-60 zero
> + * Bit 56 page exlusively mapped
> + * Bits 57-60 zero
> * Bit 61 page is file-page or shared-anon
> * Bit 62 page swapped
> * Bit 63 page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 6dee68d013ff..29febec65de4 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -982,6 +982,7 @@ struct pagemapread {
> #define PM_STATUS2(v2, x) (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
>
> #define __PM_SOFT_DIRTY (1LL)
> +#define __PM_MMAP_EXCLUSIVE (2LL)
> #define PM_PRESENT PM_STATUS(4LL)
> #define PM_SWAP PM_STATUS(2LL)
> #define PM_FILE PM_STATUS(1LL)
> @@ -1074,6 +1075,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>
> if (page && !PageAnon(page))
> flags |= PM_FILE;
> + if (page && page_mapcount(page) == 1)
> + flags2 |= __PM_MMAP_EXCLUSIVE;
> if ((vma->vm_flags & VM_SOFTDIRTY))
> flags2 |= __PM_SOFT_DIRTY;
>
> @@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> else
> pmd_flags2 = 0;
>
> + if (pmd_present(*pmd)) {
> + struct page *page = pmd_page(*pmd);
> +
> + if (page_mapcount(page) == 1)
> + pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> + }
> +
> for (; addr != end; addr += PAGE_SIZE) {
> unsigned long offset;
> pagemap_entry_t pme;
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 8bdf16b8ba60..3a9f193526ee 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -70,9 +70,12 @@
> #define PM_PFRAME(x) ((x) & PM_PFRAME_MASK)
>
> #define __PM_SOFT_DIRTY (1LL)
> +#define __PM_MMAP_EXCLUSIVE (2LL)
> #define PM_PRESENT PM_STATUS(4LL)
> #define PM_SWAP PM_STATUS(2LL)
> +#define PM_FILE PM_STATUS(1LL)
> #define PM_SOFT_DIRTY __PM_PSHIFT(__PM_SOFT_DIRTY)
> +#define PM_MMAP_EXCLUSIVE __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
>
>
> /*
> @@ -100,6 +103,8 @@
> #define KPF_SLOB_FREE 49
> #define KPF_SLUB_FROZEN 50
> #define KPF_SLUB_DEBUG 51
> +#define KPF_FILE 62
> +#define KPF_MMAP_EXCLUSIVE 63
>
> #define KPF_ALL_BITS ((uint64_t)~0ULL)
> #define KPF_HACKERS_BITS (0xffffULL << 32)
> @@ -149,6 +154,9 @@ static const char * const page_flag_names[] = {
> [KPF_SLOB_FREE] = "P:slob_free",
> [KPF_SLUB_FROZEN] = "A:slub_frozen",
> [KPF_SLUB_DEBUG] = "E:slub_debug",
> +
> + [KPF_FILE] = "F:file",
> + [KPF_MMAP_EXCLUSIVE] = "1:mmap_exclusive",
> };
>
>
> @@ -452,6 +460,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
>
> if (pme & PM_SOFT_DIRTY)
> flags |= BIT(SOFTDIRTY);
> + if (pme & PM_FILE)
> + flags |= BIT(FILE);
> + if (pme & PM_MMAP_EXCLUSIVE)
> + flags |= BIT(MMAP_EXCLUSIVE);
>
> return flags;
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/