Re: [PATCH v7 2/4] mm: introduce new flag to indicate wc safe

From: David Hildenbrand
Date: Mon Feb 12 2024 - 08:13:36 EST


On 11.02.24 18:47, ankita@xxxxxxxxxx wrote:
From: Ankit Agrawal <ankita@xxxxxxxxxx>

Generalizing S2 setting from DEVICE_nGnRE to NormalNc for non PCI
devices may be problematic. E.g. GICv2 vCPU interface, which is
effectively a shared peripheral, can allow a guest to affect another
guest's interrupt distribution. The issue may be solved by limiting
the relaxation to mappings that have a user VMA. Still there is
insufficient information and uncertainity in the behavior of

s/uncertainity/uncertainty/

non PCI drivers.

Add a new flag VM_ALLOW_ANY_UNCACHED to indicate KVM that the device
is WC capable and these S2 changes can be extended to it. KVM can use
this flag to activate the code.


MM people will stumble only over this commit at some point, looking for details. It might make sense to add a bit more details on the underlying problem (user space tables vs. stage-1 vs. stage-2) and why we want to have a different mapping in user space compared to stage-1.

Then, describe that the VMA flag was found to be the simplest and cleanest way to communicate this information from VFIO to KVM.

Suggested-by: Catalin Marinas <catalin.marinas@xxxxxxx>
Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
---
include/linux/mm.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..59576e56c58b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -391,6 +391,20 @@ extern unsigned int kobjsize(const void *objp);
# define VM_UFFD_MINOR VM_NONE
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+/*
+ * This flag is used to connect VFIO to arch specific KVM code. It
+ * indicates that the memory under this VMA is safe for use with any
+ * non-cachable memory type inside KVM. Some VFIO devices, on some
+ * platforms, are thought to be unsafe and can cause machine crashes
+ * if KVM does not lock down the memory type.
+ */
+#ifdef CONFIG_64BIT
+#define VM_ALLOW_ANY_UNCACHED_BIT 39
+#define VM_ALLOW_ANY_UNCACHED BIT(VM_ALLOW_ANY_UNCACHED_BIT)
+#else
+#define VM_ALLOW_ANY_UNCACHED VM_NONE
+#endif
+
/* Bits set in the VMA until the stack is in its final location */
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY)

It's not perfect (very VFIO <-> KVM specific right now, VMA flags feel a bit wrong), but it certainly easier and cleaner than any alternatives I could think of.

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

--
Cheers,

David / dhildenb