[PATCH] x86/mm/vmfault: Make vmalloc_fault() handle large pages

From: Toshi Kani
Date: Mon Feb 08 2016 - 18:07:44 EST


Since 4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

pgd_ctor() sets the kernel's pgd entries to user's during fork(),
which makes user processes share the same page tables for the
kernel ranges. When a call to ioremap() is made at run-time that
leads to allocate a new 2nd level table (pud in 64-bit and pmd in
PAE), user process needs to re-sync with the updated kernel pgd
entry with vmalloc_fault().

Following changes are made to vmalloc_fault().

64-bit:
- No change for the sync operation as set_pgd() takes care of
huge pages as well.
- Add pud_huge() and pmd_huge() to the validation code to
handle huge pages.
- Change pud_page_vaddr() to pud_pfn() since an ioremap range
is not directly mapped (although the if-statement still works
with a bogus addr).
- Change pmd_page() to pmd_pfn() since an ioremap range is not
backed by struct page table (although the if-statement still
works with a bogus addr).

PAE:
- No change for the sync operation since the index3 pgd entry
covers the entire vmalloc range, which is always valid.
(A separate change will be needed if this assumption gets
changed regardless of the page size.)
- Add pmd_huge() to the validation code to handle huge pages.
This is only for completeness since vmalloc_fault() won't
happen for ioremap'd ranges as its pgd entry is always valid.
(I was unable to test this part of the changes as a result.)

Reported-by: Henning Schild <henning.schild@xxxxxxxxxxx>
Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
---
When this patch is accepted, please copy to stable up to 4.1.
---
arch/x86/mm/fault.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index eef44d9..e830c71 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -287,6 +287,9 @@ static noinline int vmalloc_fault(unsigned long address)
if (!pmd_k)
return -1;

+ if (pmd_huge(*pmd_k))
+ return 0;
+
pte_k = pte_offset_kernel(pmd_k, address);
if (!pte_present(*pte_k))
return -1;
@@ -360,8 +363,6 @@ void vmalloc_sync_all(void)
* 64-bit:
*
* Handle a fault on the vmalloc area
- *
- * This assumes no large pages in there.
*/
static noinline int vmalloc_fault(unsigned long address)
{
@@ -403,17 +404,23 @@ static noinline int vmalloc_fault(unsigned long address)
if (pud_none(*pud_ref))
return -1;

- if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
+ if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
BUG();

+ if (pud_huge(*pud))
+ return 0;
+
pmd = pmd_offset(pud, address);
pmd_ref = pmd_offset(pud_ref, address);
if (pmd_none(*pmd_ref))
return -1;

- if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
+ if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
BUG();

+ if (pmd_huge(*pmd))
+ return 0;
+
pte_ref = pte_offset_kernel(pmd_ref, address);
if (!pte_present(*pte_ref))
return -1;