Re: [PATCH v2] workingset: ensure memcg is valid for recency check

From: Johannes Weiner
Date: Fri Aug 18 2023 - 10:13:39 EST


On Thu, Aug 17, 2023 at 12:01:26PM -0700, Nhat Pham wrote:
> static bool lru_gen_test_recent(void *shadow, bool file, struct lruvec **lruvec,
> unsigned long *token, bool *workingset)
> {
> - int memcg_id;
> unsigned long min_seq;
> struct mem_cgroup *memcg;
> struct pglist_data *pgdat;
>
> - unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
> + unpack_shadow(shadow, &memcg, &pgdat, token, workingset);
> + if (!mem_cgroup_disabled() && !memcg)
> + return false;
>
> - memcg = mem_cgroup_from_id(memcg_id);
> *lruvec = mem_cgroup_lruvec(memcg, pgdat);
> + mem_cgroup_put(memcg);
>
> min_seq = READ_ONCE((*lruvec)->lrugen.min_seq[file]);
> return (*token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH));

This isn't quite right. lruvec's lifetime is bound to the memcg, so
the put has to happen after the code is done accessing it.

lru_gen_refault() is still using the eviction lruvec after the recency
test - but only if eviction_lruvec == refault_lruvec. This gives me
pause, because they're frequently different. This is a common setup:

root - slice - service 1
`- service 2

where slice has the limit and will be the cause of evictions, whereas
the service groups have the actual pages and see the refaults.

workingset_eviction() and workingset_refault() will do recency
evaluation against slice, and refault accounting against service X.

MGLRU will use service X to determine recency, which 1) will not
properly activate when one service is thrashing while the other is
dominating the memory allowance of slice, and 2) will not detect
refaults of pages shared between the services.

Maybe it's time to fix this first and bring the two codebases in
unison. Then the recency test and eviction group lifetime can be
encapsulated in test_recent(), and _refault() can just use
folio_memcg() to do the activation and accounting against.