Re: Low TCP throughput due to vmpressure with swap enabled

From: Johannes Weiner
Date: Tue Nov 22 2022 - 15:05:24 EST


On Mon, Nov 21, 2022 at 04:53:43PM -0800, Ivan Babrou wrote:
> Hello,
>
> We have observed a negative TCP throughput behavior from the following commit:
>
> * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure
>
> It landed back in 2016 in v4.5, so it's not exactly a new issue.
>
> The crux of the issue is that in some cases with swap present the
> workload can be unfairly throttled in terms of TCP throughput.

Thanks for the detailed analysis, Ivan.

Originally, we pushed back on sockets only when regular page reclaim
had completely failed and we were about to OOM. This patch was an
attempt to be smarter about it and equalize pressure more smoothly
between socket memory, file cache, anonymous pages.

After a recent discussion with Shakeel, I'm no longer quite sure the
kernel is the right place to attempt this sort of balancing. It kind
of depends on the workload which type of memory is more imporant. And
your report shows that vmpressure is a flawed mechanism to implement
this, anyway.

So I'm thinking we should delete the vmpressure thing, and go back to
socket throttling only if an OOM is imminent. This is in line with
what we do at the system level: sockets get throttled only after
reclaim fails and we hit hard limits. It's then up to the users and
sysadmin to allocate a reasonable amount of buffers given the overall
memory budget.

Cgroup accounting, limiting and OOM enforcement is still there for the
socket buffers, so misbehaving groups will be contained either way.

What do you think? Something like the below patch?

---