Re: [PATCH 08/24] mm/swap: check readahead policy per entry

From: Huang, Ying
Date: Mon Nov 20 2023 - 20:12:15 EST


Kairui Song <ryncsn@xxxxxxxxx> writes:

> Huang, Ying <ying.huang@xxxxxxxxx> 于2023年11月20日周一 14:07写道:
>>
>> Kairui Song <ryncsn@xxxxxxxxx> writes:
>>
>> > From: Kairui Song <kasong@xxxxxxxxxxx>
>> >
>> > Currently VMA readahead is globally disabled when any rotate disk is
>> > used as swap backend. So multiple swap devices are enabled, if a slower
>> > hard disk is set as a low priority fallback, and a high performance SSD
>> > is used and high priority swap device, vma readahead is disabled globally.
>> > The SSD swap device performance will drop by a lot.
>> >
>> > Check readahead policy per entry to avoid such problem.
>> >
>> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
>> > ---
>> > mm/swap_state.c | 12 +++++++-----
>> > 1 file changed, 7 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/mm/swap_state.c b/mm/swap_state.c
>> > index ff6756f2e8e4..fb78f7f18ed7 100644
>> > --- a/mm/swap_state.c
>> > +++ b/mm/swap_state.c
>> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_
>> > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
>> > }
>> >
>> > -static inline bool swap_use_vma_readahead(void)
>> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si)
>> > {
>> > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
>> > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead);
>> > }
>> >
>> > /*
>> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
>> >
>> > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry));
>> > if (!IS_ERR(folio)) {
>> > - bool vma_ra = swap_use_vma_readahead();
>> > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry));
>> > bool readahead;
>> >
>> > /*
>> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask,
>> > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
>> > struct vm_fault *vmf, bool *swapcached)
>> > {
>> > + struct swap_info_struct *si;
>> > struct mempolicy *mpol;
>> > struct page *page;
>> > pgoff_t ilx;
>> > bool cached;
>> >
>> > + si = swp_swap_info(entry);
>> > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
>> > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
>> > + if (swap_use_no_readahead(si, entry)) {
>> > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm);
>> > cached = false;
>> > - } else if (swap_use_vma_readahead()) {
>> > + } else if (swap_use_vma_readahead(si)) {
>>
>> It's possible that some pages are swapped out to SSD while others are
>> swapped out to HDD in a readahead window.
>>
>> I suspect that there are practical requirements to use swap on SSD and
>> HDD at the same time.
>
> Hi Ying,
>
> Thanks for the review!
>
> For the first issue "fragmented readahead window", I was planning to
> do an extra check in readahead path to skip readahead entries that are
> on different swap devices, which is not hard to do,

This is a possible solution.

> but this series is growing too long so I thought it will be better
> done later.

You don't need to keep everything in one series. Just use multiple
series. Even if they are all swap-related. They are dealing with
different problem in fact.

> For the second issue, "is there any practical use for multiple swap",
> I think actually there are. For example we are trying to use multi
> layer swap for offloading memory of different hotness on servers. And
> we also tried to implement a mechanism to migrate long sleep swap
> entries from high performance SSD/RAMDISK swap to cheap HDD swap
> device, with more than two layers of swap, which worked except the
> upstream issue, that readahead policy will no longer work as expected.

Thanks for your information.

>> > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf);
>> > cached = true;
>> > } else {

--
Best Regards,
Huang, Ying