Re: [PATCH] psi: fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim

From: Johannes Weiner
Date: Fri Nov 12 2021 - 11:53:25 EST


On Wed, Nov 10, 2021 at 09:33:12PM +0000, Brian Chen wrote:
> We've noticed cases where tasks in a cgroup are stalled on memory but
> there is little memory FULL pressure since tasks stay on the runqueue
> in reclaim.
>
> A simple example involves a single threaded program that keeps leaking
> and touching large amounts of memory. It runs in a cgroup with swap
> enabled, memory.high set at 10M and cpu.max ratio set at 5%. Though
> there is significant CPU pressure and memory SOME, there is barely any
> memory FULL since the task enters reclaim and stays on the runqueue.
> However, this memory-bound task is effectively stalled on memory and
> we expect memory FULL to match memory SOME in this scenario.
>
> The code is confused about memstall && running, thinking there is a
> stalled task and a productive task when there's only one task: a
> reclaimer that's counted as both. To fix this, we redefine the
> condition for PSI_MEM_FULL to check that all running tasks are in an
> active memstall instead of checking that there are no running tasks.
>
> case PSI_MEM_FULL:
> - return unlikely(tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]);
> + return unlikely(tasks[NR_MEMSTALL] &&
> + tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]);
>
> This will capture reclaimers. It will also capture tasks that called
> psi_memstall_enter() and are about to sleep, but this should be
> negligible noise.
>
> Signed-off-by: Brian Chen <brianchen118@xxxxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

This bug essentially causes us to count memory-some in walltime and
memory-full in tasktime, which can be quite confusing and misleading
in combined CPU and memory pressure situations.

The fix looks good to me, thanks Brian.

The bug's been there since the initial psi commit, so I don't think a
stable backport is warranted.

Peter, absent objections, can you please pick this up through -tip?