Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first

From: Jiri Slaby
Date: Fri Jun 30 2023 - 04:28:27 EST


On 30. 06. 23, 8:35, Jiri Slaby wrote:
On 29. 06. 23, 17:30, Suren Baghdasaryan wrote:
On Thu, Jun 29, 2023 at 7:40 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:

Hi,

On 27. 02. 23, 18:36, Suren Baghdasaryan wrote:
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
---
   arch/x86/Kconfig    |  1 +
   arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++
   2 files changed, 37 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a825bf031f49..df21fba77db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@ config X86_64
       # Options that are inherently 64-bit kernel only:
       select ARCH_HAS_GIGANTIC_PAGE
       select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
+     select ARCH_SUPPORTS_PER_VMA_LOCK
       select ARCH_USE_CMPXCHG_LOCKREF
       select HAVE_ARCH_SOFT_DIRTY
       select MODULES_USE_ELF_RELA
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a498ae1fbe66..e4399983c50c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -19,6 +19,7 @@
   #include <linux/uaccess.h>          /* faulthandler_disabled()      */
   #include <linux/efi.h>                      /* efi_crash_gracefully_on_page_fault()*/
   #include <linux/mm_types.h>
+#include <linux/mm.h>                        /* find_and_lock_vma() */

   #include <asm/cpufeature.h>         /* boot_cpu_has, ...            */
   #include <asm/traps.h>                      /* dotraplinkage, ...           */
@@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs,
       }
   #endif

+#ifdef CONFIG_PER_VMA_LOCK
+     if (!(flags & FAULT_FLAG_USER))
+             goto lock_mmap;
+
+     vma = lock_vma_under_rcu(mm, address);
+     if (!vma)
+             goto lock_mmap;
+
+     if (unlikely(access_error(error_code, vma))) {
+             vma_end_read(vma);
+             goto lock_mmap;
+     }
+     fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
+     vma_end_read(vma);
+
+     if (!(fault & VM_FAULT_RETRY)) {
+             count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+             goto done;
+     }
+     count_vm_vma_lock_event(VMA_LOCK_RETRY);

This is apparently not strong enough as it causes go build failures like:

[  409s] strconv
[  409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2
[  409s] fatal error: releasep: invalid p state
[  409s]

[  325s] hash/adler32
[  325s] hash/crc32
[  325s] cmd/internal/codesign
[  336s] fatal error: runtime: out of memory

Hi Jiri,
Thanks for reporting! I'm not familiar with go builds. Could you
please explain the error to me or point me to some documentation to
decipher that error?

Sorry, we are on the same boat -- me neither. It only popped up in our (openSUSE) build system and I only tracked it down by bisection. Let me know if I can try something (like a patch or gathering some debug info).

FWIW, a failed build log:
https://decibel.fi.muni.cz/~xslaby/n/vma/log.txt

and a strace for it:
https://decibel.fi.muni.cz/~xslaby/n/vma/strace.txt

An excerpt from the log:

[ 55s] runtime: marked free object in span 0x7fca6824bec8, elemsize=192 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr)
[ 55s] 0xc0002f2000 alloc marked
[ 55s] 0xc0002f20c0 alloc marked
[ 55s] 0xc0002f2180 alloc marked
[ 55s] 0xc0002f2240 free unmarked
[ 55s] 0xc0002f2300 alloc marked
[ 55s] 0xc0002f23c0 alloc marked
[ 55s] 0xc0002f2480 alloc marked
[ 55s] 0xc0002f2540 alloc marked
[ 55s] 0xc0002f2600 alloc marked
[ 55s] 0xc0002f26c0 alloc marked
[ 55s] 0xc0002f2780 alloc marked
[ 55s] 0xc0002f2840 alloc marked
[ 55s] 0xc0002f2900 alloc marked
[ 55s] 0xc0002f29c0 free unmarked
[ 55s] 0xc0002f2a80 alloc marked
[ 55s] 0xc0002f2b40 alloc marked
[ 55s] 0xc0002f2c00 alloc marked
[ 55s] 0xc0002f2cc0 alloc marked
[ 55s] 0xc0002f2d80 alloc marked
[ 55s] 0xc0002f2e40 alloc marked
[ 55s] 0xc0002f2f00 alloc marked
[ 55s] 0xc0002f2fc0 alloc marked
[ 55s] 0xc0002f3080 alloc marked
[ 55s] 0xc0002f3140 alloc marked
[ 55s] 0xc0002f3200 alloc marked
[ 55s] 0xc0002f32c0 alloc marked
[ 55s] 0xc0002f3380 free unmarked
[ 55s] 0xc0002f3440 free marked zombie


An excerpt from strace:
> 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa6a1b990, parent_tid=0x7fcaa6a1b990, exit_signal=0, stack=0x7fcaa621b000, stack_size=0x7ffe00, tls=0x7fcaa6a1b6c0} => {parent_tid=[2350]}, 88) = 2350

> 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351
> 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372
> 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354
> 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357
> 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355
> 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370
> 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> 2370 <... mmap resumed>) = 0x7fca68249000
> 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384
> 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388
> 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392
> 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395
> 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished ...>

I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some reason 0x7fca6824bec8 in that region is "bad".

thanks,--
--
js
suse labs