Re: [PATCH 08/24] mm/swap: check readahead policy per entry

From: Kairui Song
Date: Mon Nov 20 2023 - 06:19:17 EST


Huang, Ying <ying.huang@xxxxxxxxx> 于2023年11月20日周一 14:07写道:
>
> Kairui Song <ryncsn@xxxxxxxxx> writes:
>
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > Currently VMA readahead is globally disabled when any rotate disk is
> > used as swap backend. So multiple swap devices are enabled, if a slower
> > hard disk is set as a low priority fallback, and a high performance SSD
> > is used and high priority swap device, vma readahead is disabled globally.
> > The SSD swap device performance will drop by a lot.
> >
> > Check readahead policy per entry to avoid such problem.
> >
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > ---
> > mm/swap_state.c | 12 +++++++-----
> > 1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index ff6756f2e8e4..fb78f7f18ed7 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_
> > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
> > }
> >
> > -static inline bool swap_use_vma_readahead(void)
> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si)
> > {
> > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
> > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead);
> > }
> >
> > /*
> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
> >
> > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry));
> > if (!IS_ERR(folio)) {
> > - bool vma_ra = swap_use_vma_readahead();
> > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry));
> > bool readahead;
> >
> > /*
> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > struct vm_fault *vmf, bool *swapcached)
> > {
> > + struct swap_info_struct *si;
> > struct mempolicy *mpol;
> > struct page *page;
> > pgoff_t ilx;
> > bool cached;
> >
> > + si = swp_swap_info(entry);
> > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
> > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
> > + if (swap_use_no_readahead(si, entry)) {
> > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm);
> > cached = false;
> > - } else if (swap_use_vma_readahead()) {
> > + } else if (swap_use_vma_readahead(si)) {
>
> It's possible that some pages are swapped out to SSD while others are
> swapped out to HDD in a readahead window.
>
> I suspect that there are practical requirements to use swap on SSD and
> HDD at the same time.

Hi Ying,

Thanks for the review!

For the first issue "fragmented readahead window", I was planning to
do an extra check in readahead path to skip readahead entries that are
on different swap devices, which is not hard to do, but this series is
growing too long so I thought it will be better done later.

For the second issue, "is there any practical use for multiple swap",
I think actually there are. For example we are trying to use multi
layer swap for offloading memory of different hotness on servers. And
we also tried to implement a mechanism to migrate long sleep swap
entries from high performance SSD/RAMDISK swap to cheap HDD swap
device, with more than two layers of swap, which worked except the
upstream issue, that readahead policy will no longer work as expected.


>
> > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf);
> > cached = true;
> > } else {
>
> --
> Best Regards,
> Huang, Ying