Re: [PATCH] mm/vmalloc: Fix unlock order in s_stop()

From: Uladzislau Rezki
Date: Mon Dec 14 2020 - 12:57:58 EST


On Mon, Dec 14, 2020 at 03:37:46PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote:
> > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote:
> > > If we need to iterate the list efficiently, i'd suggest getting rid of
> > > the list and using an xarray instead. maybe a maple tree, once that code
> > > is better exercised.
> >
> > Not really efficiently. We need just a full scan of it propagating the
> > information about mapped and un-purged areas to user space applications.
> >
> > For example RCU-safe list is what we need, IMHO. From the other hand i
> > am not sure if xarray is RCU safe in a context of concurrent removing/adding
> > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX().
>
> It's as RCU safe as an RCU-safe list. Specifically, it guarantees:
>
> - If an element is present at all times between the start and the
> end of the iteration, it will appear in the iteration.
> - No element will appear more than once.
> - No element will appear in the iteration that was never present.
> - The iteration will terminate.
>
> If an element is added or removed between the start and end of the
> iteration, it may or may not appear. Causality is not guaranteed (eg
> if modification A is made before modification B, modification B may
> be reflected in the iteration while modification A is not).
>
Thank you for information! To make use of xarray it would require a migration
from our current vmap_area_root RB-tree to xaarray. It probably makes sense
if there are performance benefits of such migration work. Apparently running
the vmalloc benchmark shows a quite big degrade:

# X-array
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real 0m18.928s
user 0m0.017s
sys 0m0.004s
urezki@pc638:~$
[ 90.103768] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1275773 usec
[ 90.103771] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1439371 usec
[ 90.103772] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 9138051 usec
[ 90.103773] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 4821400 usec
[ 90.103774] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 2181207 usec
[ 90.103775] All test took CPU0=69774784667 cycles

# RB-tree
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real 0m13.975s
user 0m0.013s
sys 0m0.010s
urezki@pc638:~$
[ 26.633372] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 429836 usec
[ 26.633375] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 566042 usec
[ 26.633377] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 7663974 usec
[ 26.633378] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 3853388 usec
[ 26.633379] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1370097 usec
[ 26.633380] All test took CPU0=51370095742 cycles

I suspect xa_load() does provide O(log(n)) search time?

--
Vlad Rezki