Re: [PATCH v2 4/4] memcg: synchronously enforce memory.high for large overcharges

From: Chris Down
Date: Wed Feb 16 2022 - 08:12:40 EST

Next message: Leo Yan: "Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)"
Previous message: Vidya Sagar: "Re: [PATCH V1] PCI/ASPM: Save/restore L1SS Capability for suspend/resume"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Shakeel Butt writes:

Thanks, I was going to comment on v1 that I prefer to keep the implementation
of mem_cgroup_handle_over_high if possible since we know that the mechanism has
been safe in production over the past few years.

One question I have is about throttling. It looks like this new
mem_cgroup_handle_over_high callsite may mean that throttling is invoked more
than once on a misbehaving workload that's failing to reclaim since the
throttling could be invoked both here and in return to userspace, right? That
might not be a problem, but we should think about the implications of that,
especially in relation to MEMCG_MAX_HIGH_DELAY_JIFFIES.

Please note that mem_cgroup_handle_over_high() clears
memcg_nr_pages_over_high and if on the return-to-userspace path
mem_cgroup_handle_over_high() finds that memcg_nr_pages_over_high is
non-zero, then it means the task has further accumulated the charges
over high limit after a possibly synchronous
memcg_nr_pages_over_high() call.

Oh sure, my point was only that MEMCG_MAX_HIGH_DELAY_JIFFIES was to more reliably ensure we are returning to userspace at some point in the near future to allow the task to have another chance at good behaviour instead of being immediately whacked with whatever is monitoring PSI -- for example, in the case where we have a daemon which is monitoring its own PSI contributions and will make a proactive attempt to free structures in userspace.

That said, the throttling here still isn't unbounded, and it's not likely that anyone doing such large allocations after already exceeding memory.high is being a good citizen, so I think the patch makes sense as long as the change is understood and documented internally.

Thanks!

Acked-by: Chris Down <chris@xxxxxxxxxxxxxx>

Next message: Leo Yan: "Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)"
Previous message: Vidya Sagar: "Re: [PATCH V1] PCI/ASPM: Save/restore L1SS Capability for suspend/resume"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]