Re: [PATCH] mm: zswap: fix pool refcount bug around shrink_worker()

From: Nhat Pham
Date: Fri Oct 06 2023 - 17:41:07 EST


On Fri, Oct 6, 2023 at 9:00 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> When a zswap store fails due to the limit, it acquires a pool
> reference and queues the shrinker. When the shrinker runs, it drops
> the reference. However, there can be multiple store attempts before
> the shrinker wakes up and runs once. This results in reference leaks
> and eventual saturation warnings for the pool refcount.
>
> Fix this by dropping the reference again when the shrinker is already
> queued. This ensures one reference per shrinker run.
>
> Reported-by: Chris Mason <clm@xxxxxx>
> Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
> Cc: stable@xxxxxxxxxxxxxxx [5.6+]
> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> Cc: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx>
> Cc: Nhat Pham <nphamcs@xxxxxxxxx>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
> mm/zswap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 083c693602b8..37d2b1cb2ecb 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1383,8 +1383,8 @@ bool zswap_store(struct folio *folio)
>
> shrink:
> pool = zswap_pool_last_get();
> - if (pool)
> - queue_work(shrink_wq, &pool->shrink_work);
> + if (pool && !queue_work(shrink_wq, &pool->shrink_work))
> + zswap_pool_put(pool);
> goto reject;
> }
>
> --
> 2.42.0
>

Acked-by: Nhat Pham <nphamcs@xxxxxxxxx>

Random tangent: this asynchronous writeback mechanism
is always kinda weird to me. We could have quite a bit of memory
inversion before the shrinker finally kicks in and frees up zswap
pool space. But I guess if it doesn't break then don't fix it.

Maybe a shrinker that proactively writes pages back as memory
pressure builds up could help ;)