[PATCH v3 0/7] swapin refactor for optimization and unified readahead

From: Kairui Song
Date: Mon Jan 29 2024 - 12:55:07 EST


From: Kairui Song <kasong@xxxxxxxxxxx>

This series tries to unify and clean up the swapin path, introduce minor
optimization, and make both shmem swapoff make use of SWP_SYNCHRONOUS_IO
flag to skip readahead and swapcache for better performance.

Test results:
- swap out 10G zero-filled data to ZRAM then read them in:
Before: 11143285 us
After: 10692644 us (+4.1%)

- swapping off a 10G ZRAM (lzo-rle) after same workload:
Before:
time swapoff /dev/zram0
real 0m12.337s
user 0m0.001s
sys 0m12.329s

After:
time swapoff /dev/zram0
real 0m9.728s
user 0m0.001s
sys 0m9.719s

- shmem FIO test 1 on a Ryzen 5900HX:
fio -name=tmpfs --numjobs=16 --directory=/tmpfs --size=960m \
--ioengine=mmap --rw=randread --random_distribution=zipf:0.5 \
--time_based --ramp_time=1m --runtime=5m --group_reporting
(using brd as swap, 2G memcg limit)

Before:
bw ( MiB/s): min= 1167, max= 1732, per=100.00%, avg=1460.82, stdev= 4.38, samples=9536
iops : min=298938, max=443557, avg=373964.41, stdev=1121.27, samples=9536
After (+3.5%):
bw ( MiB/s): min= 1285, max= 1738, per=100.00%, avg=1512.88, stdev= 4.34, samples=9456
iops : min=328957, max=445105, avg=387294.21, stdev=1111.15, samples=9456

- shmem FIO test 2 on a Ryzen 5900HX:
fio -name=tmpfs --numjobs=16 --directory=/tmpfs --size=960m \
--ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \
--time_based --ramp_time=1m --runtime=5m --group_reporting
(using brd as swap, 2G memcg limit)

Before:
bw ( MiB/s): min= 5296, max= 7112, per=100.00%, avg=6131.93, stdev=17.09, samples=9536
iops : min=1355934, max=1820833, avg=1569769.11, stdev=4375.93, samples=9536
After (+3.1%):
bw ( MiB/s): min= 5466, max= 7173, per=100.00%, avg=6324.51, stdev=16.66, samples=9521
iops : min=1399355, max=1836435, avg=1619068.90, stdev=4263.94, samples=9521

- Some built objects are very slightly smaller (gcc 13.2.1):
/scripts/bloat-o-meter ./vmlinux ./vmlinux.new
add/remove: 4/2 grow/shrink: 1/10 up/down: 818/-983 (-165)
Function old new delta
swapin_entry - 482 +482
mm_counter - 248 +248
shmem_swapin_folio 1412 1468 +56
__pfx_swapin_entry - 16 +16
__pfx_mm_counter - 16 +16
__read_swap_cache_async 738 736 -2
copy_present_pte 1258 1249 -9
mem_cgroup_swapin_charge_folio 297 285 -12
__pfx_swapin_readahead 16 - -16
swap_cache_get_folio 364 345 -19
do_anonymous_page 1488 1458 -30
unuse_pte_range 889 833 -56
free_p4d_range 524 446 -78
restore_exclusive_pte 937 822 -115
do_swap_page 2969 2817 -152
swapin_readahead 239 - -239
copy_nonpresent_pte 1478 1223 -255
Total: Before=26056243, After=26056078, chg -0.00%

V2: https://lore.kernel.org/linux-mm/20240102175338.62012-1-ryncsn@xxxxxxxxx/
Update from V2:
- Many code path clean up (merge swapin_entry with swapin_entry_mpol,
drop second param of mem_cgroup_swapin_charge_folio, swapin_entry
takes a pointer to folio as return value instaed of pointer to
boolean to reduce LOC and logic), thanks for Huang, Ying.
- Don't use cluster readhead for swapoff, the performance is worse
than VMA readahead for NVME.
- Add a refactor patch for swap_cache_get_folio.

V1: https://lore.kernel.org/linux-mm/20231119194740.94101-1-ryncsn@xxxxxxxxx/T/
Update from V1:
- Rebased based on mm-unstable.
- Remove behaviour changing patches, will submit in seperate series
later.
- Code style, naming and comments updates.
- Thanks to Chris Li for very detailed and helpful review of V1. Thanks
to Matthew Wilcox and Huang Ying for helpful suggestions.

Kairui Song (7):
mm/swapfile.c: add back some comment
mm/swap: move no readahead swapin code to a stand-alone helper
mm/swap: always account swapped in page into current memcg
mm/swap: introduce swapin_entry for unified readahead policy
mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO
mm/swap, shmem: use unified swapin helper for shmem
mm/swap: refactor swap_cache_get_folio

include/linux/memcontrol.h | 4 +-
mm/memcontrol.c | 5 +-
mm/memory.c | 45 ++--------
mm/shmem.c | 50 +++++++----
mm/swap.h | 23 ++---
mm/swap_state.c | 176 ++++++++++++++++++++++++++-----------
mm/swapfile.c | 20 +++--
7 files changed, 190 insertions(+), 133 deletions(-)

--
2.43.0