Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole

From: Ryan Roberts
Date: Fri Mar 15 2024 - 08:07:09 EST


On 15/03/2024 10:01, Barry Song wrote:
> On Fri, Mar 15, 2024 at 10:17 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>
>> Barry Song <21cnbao@xxxxxxxxx> writes:
>>
>>> On Fri, Mar 15, 2024 at 9:43 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>>>
>>>> Barry Song <21cnbao@xxxxxxxxx> writes:
>>>>
>>>>> From: Chuanhua Han <hanchuanhua@xxxxxxxx>
>>>>>
>>>>> On an embedded system like Android, more than half of anon memory is
>>>>> actually in swap devices such as zRAM. For example, while an app is
>>>>> switched to background, its most memory might be swapped-out.
>>>>>
>>>>> Now we have mTHP features, unfortunately, if we don't support large folios
>>>>> swap-in, once those large folios are swapped-out, we immediately lose the
>>>>> performance gain we can get through large folios and hardware optimization
>>>>> such as CONT-PTE.
>>>>>
>>>>> This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in
>>>>> to those contiguous swaps which were likely swapped out from mTHP as a
>>>>> whole.
>>>>>
>>>>> Meanwhile, the current implementation only covers the SWAP_SYCHRONOUS
>>>>> case. It doesn't support swapin_readahead as large folios yet since this
>>>>> kind of shared memory is much less than memory mapped by single process.
>>>>
>>>> In contrast, I still think that it's better to start with normal swap-in
>>>> path, then expand to SWAP_SYCHRONOUS case.
>>>
>>> I'd rather try the reverse direction as non-sync anon memory is only around
>>> 3% in a phone as my observation.
>>
>> Phone is not the only platform that Linux is running on.
>
> I suppose it's generally true that forked shared anonymous pages only
> constitute a
> small portion of all anonymous pages. The majority of anonymous pages are within
> a single process.
>
> I agree phones are not the only platform. But Rome wasn't built in a
> day. I can only get
> started on a hardware which I can easily reach and have enough hardware/test
> resources on it. So we may take the first step which can be applied on
> a real product
> and improve its performance, and step by step, we broaden it and make it
> widely useful to various areas in which I can't reach :-)
>
> so probably we can have a sysfs "enable" entry with default "n" or
> have a maximum
> swap-in order as Ryan's suggestion [1] at the beginning,

I wasn't neccessarily suggesting that we should hard-code an upper limit. I was
just pointing out that we likely need some policy somewhere because the right
thing very likely depends on the folio size and workload. And there is likely
similar policy needed for CoW.

We already have per-thp-size directories in sysfs, so there is a natural place
to add new controls as you suggest - that would fit well. Of course if we can
avoid exposing yet more controls that would be preferable.

>
> "
> So in the common case, swap-in will pull in the same size of folio as was
> swapped-out. Is that definitely the right policy for all folio sizes? Certainly
> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure
> it makes sense for 2M THP; As the size increases the chances of actually needing
> all of the folio reduces so chances are we are wasting IO. There are similar
> arguments for CoW, where we currently copy 1 page per fault - it probably makes
> sense to copy the whole folio up to a certain size.
> "
>