On 13/02/2024 12:19, David Hildenbrand wrote:
On 13.02.24 13:06, Ryan Roberts wrote:
On 12/02/2024 20:38, Ryan Roberts wrote:
[...]
+static inline bool mm_is_user(struct mm_struct *mm)
+{
+ /*
+ * Don't attempt to apply the contig bit to kernel mappings, because
+ * dynamically adding/removing the contig bit can cause page faults.
+ * These racing faults are ok for user space, since they get serialized
+ * on the PTL. But kernel mappings can't tolerate faults.
+ */
+ return mm != &init_mm;
+}
We also have the efi_mm as a non-user mm, though I don't think we manipulate
that while it is live, and I'm not sure if that needs any special handling.
Well we never need this function in the hot (order-0 folio) path, so I think I
could add a check for efi_mm here with performance implication. It's probably
safest to explicitly exclude it? What do you think?
Oops: This should have read "I think I could add a check for efi_mm here
*without* performance implication"
It turns out that efi_mm is only defined when CONFIG_EFI is enabled. I can do
this:
return mm != &init_mm && (!IS_ENABLED(CONFIG_EFI) || mm != &efi_mm);
Is that acceptable? This is my preference, but nothing else outside of efi
references this symbol currently.
Or perhaps I can convince myself that its safe to treat efi_mm like userspace.
There are a couple of things that need to be garanteed for it to be safe:
- The PFNs of present ptes either need to have an associated struct page or
need to have the PTE_SPECIAL bit set (either pte_mkspecial() or
pte_mkdevmap())
- Live mappings must either be static (no changes that could cause
fold/unfold
while live) or the system must be able to tolerate a temporary fault
Mark suggests efi_mm is not manipulated while live, so that meets the latter
requirement, but I'm not sure about the former?
I've gone through all the efi code, and conclude that, as Mark suggests, the
mappings are indeed static. And additionally, the ptes are populated using only
the _private_ ptep API, so there is no issue here. As just discussed with Mark,
my prefereence is to not make any changes to code, and just add a comment
describing why efi_mm is safe.
Details:
* Registered with ptdump
* ptep_get_lockless()
* efi_create_mapping -> create_pgd_mapping … -> init_pte:
* __ptep_get()
* __set_pte()
* efi_memattr_apply_permissions -> efi_set_mapping_permissions … ->
set_permissions
* __ptep_get()
* __set_pte()
Sound good. We could add some VM_WARN_ON if we ever get the efi_mm via the
"official" APIs.
We could, but that would lead to the same linkage issue, which I'm trying to
avoid in the first place:
VM_WARN_ON(IS_ENABLED(CONFIG_EFI) && mm == efi_mm);
This creates new source code dependencies, which I would rather avoid if possible.