Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

From: Chengming Zhou
Date: Thu Dec 14 2023 - 10:04:05 EST


On 2023/12/14 21:57, Chengming Zhou wrote:
> On 2023/12/14 21:37, Yosry Ahmed wrote:
>> On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
>> <zhouchengming@xxxxxxxxxxxxx> wrote:
>>>
>>> On 2023/12/14 08:18, Nhat Pham wrote:
>>>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>>>>>
>>>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
>>>>> <zhouchengming@xxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
>>>>>> we only need at most one page when compress, and the "dlen" is also
>>>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
>>>>>> we don't wanna store the output in zswap anyway.
>>>>>>
>>>>>> So change it to one page, and delete the stale comment.
>>>>>
>>>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
>>>>> be nice if someone has the context, perhaps one of the maintainers.
>>>>
>>>> It'd be very nice indeed.
>>>>
>>>>>
>>>>> One potential reason is that we used to store a zswap header
>>>>> containing the swap entry in the compressed page for writeback
>>>>> purposes, but we don't do that anymore. Maybe we wanted to be able to
>>>>> handle the case where an incompressible page would exceed PAGE_SIZE
>>>>> because of that?
>>>>
>>>> It could be hmm. I didn't study the old zswap architecture too much,
>>>> but it has been 2 * PAGE_SIZE since the time zswap was first merged
>>>> last I checked.
>>>> I'm not 100% comfortable ACK-ing the undoing of something that looks
>>>> so intentional, but FTR, AFAICT, this looks correct to me.
>>>
>>> Right, there is no any history about the reason why we needed 2 pages.
>>> But obviously only one page is needed from the current code and no any
>>> problem found in the kernel build stress testing.
>>
>> Could you try manually stressing the compression with data that
>> doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
>> that this case is specifically handled. I think using data from
>> /dev/random will do that but please double check that dlen ==
>> PAGE_SIZE.
>
> I just did the same kernel build testing, indeed there are a few cases
> that output dlen == PAGE_SIZE.
>
> bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'
>
> @[1]: 2
> @[0]: 12011430

I think we shouldn't put these poorly compressed output into zswap,
maybe it's better to early return in these cases when compress ratio
< threshold ratio, which can be tune by the user?

e.g. in the same kernel build testing:

bpftrace -e 'k:zpool_malloc {@[(uint32)arg1>2048]=count()}'

@[1]: 1597706
@[0]: 10886138