Re: [PATCH v2 1/2] mm: memcg: refactor page state unit helpers

From: Yosry Ahmed
Date: Thu Oct 05 2023 - 10:15:19 EST


On Thu, Oct 5, 2023 at 2:06 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> On Wed, Oct 04, 2023 at 02:36:19PM -0400, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > Yes, it's because of node resolution which event counters generally
> > don't have. Some of the refault events influence node-local reclaim
> > decisions, see mm/vmscan.c::snapshot_refaults().
> >
> > There are a few other event counters in the stat array that people
> > thought would be useful to have split out in
> > /sys/devices/system/node/nodeN/vmstat to understand numa behavior
> > better.
> >
> > It's a bit messy.
> >
> > Some events would be useful to move to 'stats' for the numa awareness,
> > such as the allocator stats and reclaim activity.
> >
> > Some events would be useful to move to 'stats' for the numa awareness,
> > but don't have the zone resolution required by them, such as
> > kswapd/kcompactd wakeups.
>
> Thanks for the enlightenment.
>
> > Some events aren't numa specific, such as oom kills, drop_pagecache.
>
> These are oddballs indeed. As with the normalization patchset these are
> counted as PAGE_SIZE^W 1 error but they should rather be an infinite
> error (to warrant a flush).
>
> So my feedback to this series is:
> - patch 1/2 -- creating two classes of units is consequence of unclarity
> between state and events (as in event=Δstate/Δt) and resolution
> (global vs per-node), so the better approach would be to tidy this up,

I am not really sure what you mean here. I understand that this series
fixes the unit normalization for state but leaves events out of it.
Looking at the event items tracked by memcg in memcg_vm_event_stat it
looks to me that most of them correspond roughly to a page's worth of
updates (all but the THP_* events). We don't track things like
OOM_KILL and DROP_PAGECACHE per memcg as far as I can tell.

Do you mean that we should add something similar to
memcg_page_state_unit() for events as well to get all of them right?
If yes, I think that should be easy to add, it would only special case
THP_* events.

Alternatively, I can add a comment above the call to
memcg_rstat_updated() in __count_memcg_events() explaining why we
don't normalize the event count for now.

> - patch 2/2 -- it could use the single unit class that exists,
> it'll bound the error of printed numbers afterall (and can be changed
> later depending on how it affects internal consumers).

This will mean that WORKINGSET_* state will become more stale. We will
need 4096 as many updates as today to get a flush. These are used by
internal flushers (reclaim), and are exposed to userspace. I am not
sure we want to do that.

>
> My 0.02€,
> Michal