Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

From: Kamezawa Hiroyuki
Date: Tue Dec 15 2015 - 04:30:33 EST


On 2015/12/15 17:30, Vladimir Davydov wrote:
On Tue, Dec 15, 2015 at 12:12:40PM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/15 0:30, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy we charge memsw, which is dubious, because:

- memsw.limit must be >= memory.limit, so it is impossible to limit
swap usage less than memory usage. Taking into account the fact that
the primary limiting mechanism in the unified hierarchy is
memory.high while memory.limit is either left unset or set to a very
large value, moving memsw.limit knob to the unified hierarchy would
effectively make it impossible to limit swap usage according to the
user preference.

- memsw.usage != memory.usage + swap.usage, because a page occupying
both swap entry and a swap cache page is charged only once to memsw
counter. As a result, it is possible to effectively eat up to
memory.limit of memory pages *and* memsw.limit of swap entries, which
looks unexpected.

That said, we should provide a different swap limiting mechanism for
cgroup2.
This patch adds mem_cgroup->swap counter, which charges the actual
number of swap entries used by a cgroup. It is only charged in the
unified hierarchy, while the legacy hierarchy memsw logic is left
intact.

I agree that the previous semantic was awkward. The problem I can see
with this approach is that once the swap limit is reached the anon
memory pressure might spill over to other and unrelated memcgs during
the global memory pressure. I guess this is what Kame referred to as
anon would become mlocked basically. This would be even more of an issue
with resource delegation to sub-hierarchies because nobody will prevent
setting the swap amount to a small value and use that as an anon memory
protection.

I guess this was the reason why this approach hasn't been chosen before

Yes. At that age, "never break global VM" was the policy. And "mlock" can be
used for attacking system.

If we are talking about "attacking system" from inside a container,
there are much easier and disruptive ways, e.g. running a fork-bomb or
creating pipes - such memory can't be reclaimed and global OOM killer
won't help.

You're right. We just wanted to avoid affecting global memory reclaim by
each cgroup settings.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/