Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure

From: Abel Wu
Date: Fri Jul 28 2023 - 08:45:44 EST


On 7/27/23 8:19 AM, Roman Gushchin wrote:
On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote:
On 7/26/23 10:56 AM, Roman Gushchin wrote:
On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
Hi Roman, thanks for taking time to have a look!

Overall I think it's a good idea to clean these things up and thank you
for working on this. But I wonder if we can make the next step and leave only
one mechanism for both cgroup v1 and v2 instead of having this weird setup
where memcg->socket_pressure is set differently from different paths on cgroup
v1 and v2.

There is some difficulty in unifying the mechanism for both cgroup
designs. Throttling socket memory allocation when memcg is under
pressure only makes sense when socket memory and other usages are
sharing the same limit, which is not true for cgroupv1. Thoughts?

I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
as it is, except when it creates an unnecessary complexity in the code.

Are you suggesting that the 2nd patch can be ignored and keep
->tcpmem_pressure as it is? Or keep the 2nd patch and add some
explanation around as you suggested in last reply?

I suggest to split a code refactoring (which is not expected to bring any
functional changes) and an actual change of the behavior on cgroup v1.
Re the refactoring: I see a lot of value in adding comments and make the
code more readable, I don't see that much value in merging two variables.
But if it comes organically with the code simplification - nice.

I see, thanks for the clarification!


I'm curious, was your work driven by some real-world problem or a desire to clean
up the code? Both are valid reasons of course.

We (a cloud service provider) are migrating users to cgroupv2,
but encountered some problems among which the socket memory
really puts us in a difficult situation. There is no specific
threshold for socket memory in cgroupv2 and relies largely on
workloads doing traffic control themselves.

Say one workload behaves fine in cgroupv1 with 10G of ->memory
and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2
with 11G of ->memory due to burst memory usage on socket.

It's rational for the workloads to build some traffic control
to better utilize the resources they bought, but from kernel's
point of view it's also reasonable to suppress the allocation
of socket memory once there is a shortage of free memory, given
that performance degradation is better than failure.

Yeah, I can see it. But Idk if it's too workload-specific to have
a single-policy-fits-all-cases approach.
E.g. some workloads might prefer to have a portion of pagecache
being reclaimed.
What do you think?

Now the memcg is considered to be under pressure if the number of
pages reclaimed is much less than desired. I doubt it could be a
win in such case to spend more time on reclaiming while letting
socket continue to allocate memory (which could make things worse),
compared to relieving reclaim pressure and putting time on its real
work.

Best,
Abel