Re: [PATCH v1] x86/mm/pat: fix VM_PAT handling in COW mappings

From: David Hildenbrand
Date: Tue Mar 26 2024 - 04:48:46 EST


On 26.03.24 09:33, Ingo Molnar wrote:

* David Hildenbrand <david@xxxxxxxxxx> wrote:

On 12.03.24 20:22, Matthew Wilcox wrote:
On Tue, Mar 12, 2024 at 07:11:18PM +0100, David Hildenbrand wrote:
PAT handling won't do the right thing in COW mappings: the first PTE
(or, in fact, all PTEs) can be replaced during write faults to point at
anon folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.

I guess the first question is: Why do we want to support COW mappings
of VM_PAT areas? What breaks if we just disallow it?

Well, that was my first approach. Then I decided to be less radical (IOW
make my life easier by breaking less user space) and "fix it" with
minimal effort.

Chances of breaking some weird user space is possible, although I believe
for most such mappings MAP_PRIVATE doesn't make too much sense sense.

Nasty COW support for VM_PFNMAP mappings dates back forever. So does PAT
support.

I can try finding digging through some possible user space users
tomorrow.

I'd much prefer restricting VM_PAT areas than expanding support. Could we

Note that we're not expanding support, we're fixing what used to be
possible before but mostly broke silently.

But I agree that we should rather remove these corner cases instead of fixing
them.

try the trivial restriction approach first, and only go with your original
patch if that fails?

Which version would you prefer, I had two alternatives (excluding comment
changes, white-space expected to be broken).


1) Disallow when we would have set VM_PAT on is_cow_mapping()

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..6979912b1a5d 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -994,6 +994,9 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
&& size == (vma->vm_end - vma->vm_start))) {
int ret;
+ if (is_cow_mapping(vma->vm_flags))
+ return -EINVAL;
+
ret = reserve_pfn_range(paddr, size, prot, 0);
if (ret == 0 && vma)
vm_flags_set(vma, VM_PAT);


2) Fallback to !VM_PAT

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..8e97156c9be8 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -990,8 +990,8 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
enum page_cache_mode pcm;
/* reserve the whole chunk starting from paddr */
- if (!vma || (addr == vma->vm_start
- && size == (vma->vm_end - vma->vm_start))) {
+ if (!vma || (!is_cow_mapping(vma->vm_flags) && addr == vma->vm_start &&
+ size == (vma->vm_end - vma->vm_start))) {
int ret;
ret = reserve_pfn_range(paddr, size, prot, 0);



Personally, I'd go for 2).

--
Cheers,

David / dhildenb