Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo

From: Qi Zheng
Date: Thu Dec 16 2021 - 10:43:30 EST




On 12/16/21 11:37 PM, Greg KH wrote:
On Thu, Dec 16, 2021 at 11:31:36PM +0800, Qi Zheng wrote:


On 12/16/21 9:16 PM, Greg KH wrote:
On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
In /proc/meminfo, we can show the sum of all the available memory
as "MemAvailable". Add the same counter also to per-node meminfo
under /sys.

With this counter, some processes that bind nodes can make some
decisions by reading the "MemAvailable" of the corresponding nodes
directly.

Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
drivers/base/node.c | 4 ++++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 5 +++++
mm/page_alloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 59 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 87acc47e8951..deb2a7965ae4 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
struct sysinfo i;
unsigned long sreclaimable, sunreclaimable;
unsigned long swapcached = 0;
+ long available;
si_meminfo_node(&i, nid);
+ available = si_mem_available_node(&i, nid);
sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
#ifdef CONFIG_SWAP
@@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d MemTotal: %8lu kB\n"
"Node %d MemFree: %8lu kB\n"
"Node %d MemUsed: %8lu kB\n"
+ "Node %d MemAvailable: %8lu kB\n"

You just changed a user/kernel api without documenting it anywhere, or
ensuring that you did not just break anything.

Hi greg k-h,

The MemAvailable has long existed in the /proc/meminfo, it's meaning
has been described in the Documentation/filesystems/proc.rst. Since
the semantics of per-node MemAvailable has not changed, so I did not
add a new document description.

This is not a proc file, it is in sysfs.

And it violates all of the sysfs rules, and has all of the problems that
proc files have. So the worst of both worlds :(

Also, this api is crazy, and not ok, please never add anything new to
it, it is broken as-is.

The consideration of adding per-node MemAvailable is that some processes
that bind nodes need this information to do some decisions.

Now their approach is to read other information in per-node meminfo
and /proc/sys/vm/watermark_scale_factor, and then approximate this
value. With this counter, they can directly read
/sys/devices/system/node/node*/meminfo to get the MemAvailable
information of each node.

And MemTotal, MemFree and SReclaimable(etc.) all have corresponding
per-node versions, so I think that adding per-node MemAvailable might
also make sense. :)

Please no, I do not want new things added to this file, as you might
break parsers of this file.

OK, Got it.

Thank,
Qi


Also, again, you did not document this at all in Documentation/ABI/ so
for that reason alone it is not ok.

thanks,

greg k-h


--
Thanks,
Qi