Re: [PATCH 2/2] mm: Consider subtrees in memory.events

From: Michal Hocko
Date: Fri Jan 25 2019 - 03:42:19 EST


On Thu 24-01-19 13:23:28, Johannes Weiner wrote:
> On Thu, Jan 24, 2019 at 06:01:17PM +0100, Michal Hocko wrote:
> > On Thu 24-01-19 11:00:10, Johannes Weiner wrote:
> > [...]
> > > We cannot fully eliminate a risk for regression, but it strikes me as
> > > highly unlikely, given the extremely young age of cgroup2-based system
> > > management and surrounding tooling.
> >
> > I am not really sure what you consider young but this interface is 4.0+
> > IIRC and the cgroup v2 is considered stable since 4.5 unless I
> > missrememeber and that is not a short time period in my book.
>
> If you read my sentence again, I'm not talking about the kernel but
> the surrounding infrastructure that consumes this data. The risk is
> not dependent on the age of the interface age, but on its adoption.

You really have to assume the user visible interface is consumed shortly
after it is exposed/considered stable in this case as cgroups v2 was
explicitly called unstable for a considerable period of time. This is a
general policy regarding user APIs in the kernel. I can see arguments a
next release after introduction or in similar cases but this is 3 years
ago. We already have distribution kernels based on 4.12 kernel and it is
old comparing to 5.0.

> > Changing interfaces now represents a non-trivial risk and so far I
> > haven't heard any actual usecase where the current semantic is
> > actually wrong. Inconsistency on its own is not a sufficient
> > justification IMO.
>
> It can be seen either way, and in isolation it wouldn't be wrong to
> count events on the local level. But we made that decision for the
> entire interface, and this file is the odd one out now. From that
> comprehensive perspective, yes, the behavior is wrong.

I do see your point about consistency. But it is also important to
consider the usability of this interface. As already mentioned, catching
an oom event at a level where the oom doesn't happen and having hard
time to identify that place without races is a not a straightforward API
to use. So it might be really the case that the api is actually usable
for its purpose.

> It really
> confuses people who are trying to use it, because they *do* expect it
> to behave recursively.

Then we should improve the documentation. But seriously these are no
strong reasons to change a long term semantic people might rely on.

> I'm really having a hard time believing there are existing cgroup2
> users with specific expectations for the non-recursive behavior...

I can certainly imagine monitoring tools to hook at levels where limits
are set and report events as they happen. It would be more than
confusing to receive events for reclaim/ooms that hasn't happened at
that level just because a delegated memcg down the hierarchy has decided
to set a more restrictive limits. Really this is a very unexpected
behavior change for anybody using that interface right now on anything
but leaf memcgs.
--
Michal Hocko
SUSE Labs