Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

From: Mel Gorman
Date: Thu Jan 06 2022 - 04:46:57 EST


On Sun, Jan 02, 2022 at 12:31:29PM +0900, skseofh@xxxxxxxxx wrote:
> From: Daero Lee <skseofh@xxxxxxxxx>
>
> In kswapd_try_to_sleep function, to check whether kswapd can sleep,
> the prepare_kswapd_sleep function is called twice.
>
> If free pages are below high-watermark in the first call,
> the @remaining variable is not updated at 0 and the
> prepare_kswapd_sleep function is called for the second time.
>
> I think it is necessary to set the initial value of the
> @remaining to a non-zero value to prevent consecutive calls
> to the same function.
>
> Signed-off-by: Daero Lee <skseofh@xxxxxxxxx>
> ---
> mm/vmscan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 700434db5735..1217ecec5bbb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> /*
> * Return the order kswapd stopped reclaiming at as
> * prepare_kswapd_sleep() takes it into account. If another caller
> - * entered the allocator slow path while kswapd was awake, order will
> + * entered the allqocator slow path while kswapd was awake, order will
> * remain at the higher level.
> */
> return sc.order;

This hunk just adds a typo, drop it.

> @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> unsigned int highest_zoneidx)
> {
> - long remaining = 0;
> + long remaining = ~0;
> DEFINE_WAIT(wait);
>
> if (freezing(current) || kthread_should_stop())

While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
is balanced on the first try, it then does not restore the vmstat
thresholds and doesn't call schedul() for kswapd to go to sleep.

I think you did spot a problem but I suspect you want something like
the following untested patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 700434db5735..40784693c840 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
unsigned int highest_zoneidx)
{
- long remaining = 0;
+ long remaining;
+ bool balanced;
DEFINE_WAIT(wait);

if (freezing(current) || kthread_should_stop())
@@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
* eligible zone balanced that it's also unlikely that compaction will
* succeed.
*/
- if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+ balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx);
+ if (balanced) {
/*
* Compaction records what page blocks it recently failed to
* isolate pages from and skips them in the future scanning.
@@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o

remaining = schedule_timeout(HZ/10);

+ /* Is pgdat balanced after a short sleep? */
+ balanced = prepare_kswapd_sleep(pgdat, reclaim_order,
+ highest_zoneidx);
+
/*
* If woken prematurely then reset kswapd_highest_zoneidx and
* order. The values will either be from a wakeup request or
@@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
}

/*
- * After a short sleep, check if it was a premature sleep. If not, then
- * go fully to sleep until explicitly woken up.
+ * If balanced to the high watermark, restore vmstat thresholds and
+ * kswapd goes to sleep. If kswapd remains awake, account whether
+ * the low or high watermark was hit quickly.
*/
- if (!remaining &&
- prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+ if (balanced) {
trace_mm_vmscan_kswapd_sleep(pgdat->node_id);

/*