Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many_isolated()

From: Huang, Ying
Date: Sat Apr 24 2021 - 20:48:30 EST


Yu Zhao <yuzhao@xxxxxxxxxx> writes:
[snip]

> @@ -2966,13 +2938,20 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> /* need some check for avoid more shrink_zone() */
> }
>
> - /* See comment about same check for global reclaim above */
> - if (zone->zone_pgdat == last_pgdat)
> - continue;
> - last_pgdat = zone->zone_pgdat;
> shrink_node(zone->zone_pgdat, sc);
> }
>
> + if (last_pgdat)
> + atomic_dec(&last_pgdat->nr_reclaimers);
> + else if (should_retry) {
> + /* wait a bit for the reclaimer. */
> + if (!schedule_timeout_killable(HZ / 10))

Once we reached here, even accidentally, the caller needs to sleep at
least 100ms. How about use a semaphore for pgdat->nr_reclaimers? Then
the sleeper can be waken up when the resource is considered enough.

Best Regards,
Huang, Ying

> + goto retry;
> +
> + /* We are about to die and free our memory. Return now. */
> + sc->nr_reclaimed += SWAP_CLUSTER_MAX;
> + }
> +
> /*
> * Restore to original mask to avoid the impact on the caller if we
> * promoted it to __GFP_HIGHMEM.

[snip]