4level page tables for Linux

From: Andi Kleen
Date: Tue Oct 12 2004 - 09:00:51 EST



I released a 4level page table patch for 2.6.9rc4. It is available
from ftp://ftp.suse.com/pub/people/ak/4level/4level-2.6.9rc4-1.gz

It changes the Linux MM which currently only supports 3 levels of
page tables to support a fourth level (PML4). This is needed to
exceed more than 512GB of virtual address space per process on x86-64.
People have been running into the 512GB limit when mmaping large files.

The patch extends all the code in the VM that walks the 3level
hierarchy to handle the fourth level too.

The patch changes x86-64 to now support 47bits (128TB) of virtual
address space per process.

The changes are fortunately quite localized. Only mm/* had to change
(mostly in straight forward ways), drivers etc. already were well
isolated from the page tables with the get_user_pages() function.
The only exception was DRM, which needed a small patch.

Excluding the i386/x86-64 architecture specific parts the patch looks like:

drivers/char/drm/drm_memory.h | 3
fs/exec.c | 6
include/asm-generic/nopml4-page.h | 11
include/asm-generic/nopml4-pgalloc.h | 21 +
include/asm-generic/nopml4-pgtable.h | 39 ++
include/asm-generic/pgtable.h | 2
include/asm-generic/tlb.h | 6
include/linux/init_task.h | 2
include/linux/mm.h | 10
include/linux/sched.h | 2
kernel/fork.c | 6
mm/fremap.c | 18 -
mm/memory.c | 584 ++++++++++++++++++++++++-----------
mm/mempolicy.c | 18 -
mm/mmap.c | 36 +-
mm/mprotect.c | 44 ++
mm/mremap.c | 21 +
mm/msync.c | 43 ++
mm/rmap.c | 21 +
mm/swapfile.c | 61 ++-
mm/vmalloc.c | 85 ++++-

For architectures with less levels of page tables the code should be mostly a
nop.

The patch currently only supports x86-64 and i386. Other architectures
will need some changes to compile. The changes to convert an architecture
over are quite straight forward, please see the changes to i386 in
the patch. I renamed all functions that need changing so things
that are broken will not compile.

I will need some help to do this conversion because I can only test
i386 and x86-64 easily. I can convert architectures over, but someone
will need to test the result.

Plan is to merge it into -mm* ASAP, when the major architectures

have been ported over.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/