Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first

From: Jiri Slaby
Date: Mon Jul 03 2023 - 06:47:42 EST


Cc Jacob Young (from kernel bugzilla)

On 30. 06. 23, 19:40, Suren Baghdasaryan wrote:
On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:

On 30. 06. 23, 10:28, Jiri Slaby wrote:
> 2348
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351
> 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372
> 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354
> 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357
> 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355
> 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370
> 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> 2370 <... mmap resumed>) = 0x7fca68249000
> 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384
> 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388
> 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392
> 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395
> 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished
...>

I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON
0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some
reason 0x7fca6824bec8 in that region is "bad".

Thanks for the analysis Jiri.
Is it possible from these logs to identify whether 2370 finished the
mmap operation before 2395 tried to access 0x7fca6824bec8? That access
has to happen only after mmap finishes mapping the region.

Hi,

it's hard to tell, but I assume so.

For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in:
https://bugzilla.kernel.org/show_bug.cgi?id=217624
;)

FWIW, I can reproduce using the test case too.

thanks,
--
js
suse labs