[PATCH v2 1/1] mm: Optimizing hugepage zeroing in arm64

From: Prathu Baronia
Date: Tue Feb 02 2021 - 02:44:28 EST


In !HIGHMEM cases, specially in 64-bit architectures, we don't need temp
mapping of pages. Hence, k(map|unmap)_atomic() acts as nothing more than
multiple barrier() calls, for example for a 2MB hugepage in
clear_huge_page() these are called 512 times i.e. to map and unmap each
subpage that means in total 2048 barrier calls. This called for
optimization. Simply getting VADDR from page in the form of kmap_local_*
APIs does the job for us. We profiled clear_huge_page() using ftrace
and observed an improvement of 62%.

Setup:-
Below data has been collected on Qualcomm's SM7250 SoC THP enabled
(kernel
v4.19.113) with only CPU-0(Cortex-A55) and CPU-7(Cortex-A76) switched on
and set to max frequency, also DDR set to perf governor.

FTRACE Data:-

Base data:-
Number of iterations: 48
Mean of allocation time: 349.5 us
std deviation: 74.5 us

v1 data:-
Number of iterations: 48
Mean of allocation time: 131 us
std deviation: 32.7 us

The following simple userspace experiment to allocate
100MB(BUF_SZ) of pages and writing to it gave us a good insight,
we observed an improvement of 42% in allocation and writing timings.
-------------------------------------------------------------
Test code snippet
-------------------------------------------------------------
clock_start();
buf = malloc(BUF_SZ); /* Allocate 100 MB of memory */

for(i=0; i < BUF_SZ_PAGES; i++)
{
*((int *)(buf + (i*PAGE_SIZE))) = 1;
}
clock_end();
-------------------------------------------------------------

Malloc test timings for 100MB anon allocation:-

Base data:-
Number of iterations: 100
Mean of allocation time: 31831 us
std deviation: 4286 us

v1 data:-
Number of iterations: 100
Mean of allocation time: 18193 us
std deviation: 4915 us

Reported-by: Chintan Pandya <chintan.pandya@xxxxxxxxxxx>
Signed-off-by: Prathu Baronia <prathu.baronia@xxxxxxxxxxx>
---
include/linux/highmem.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index d2c70d3772a3..444df139b489 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -146,9 +146,9 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
#ifndef clear_user_highpage
static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
{
- void *addr = kmap_atomic(page);
+ void *addr = kmap_local_page(page);
clear_user_page(addr, vaddr, page);
- kunmap_atomic(addr);
+ kunmap_local(addr);
}
#endif

--
2.17.1