Re: [RFC] how the ballooned memory should be accounted by the drivers inside the guests? (was:[PATCH v6 1/2] Create debugfs file with virtio balloon usage information)

From: Alexander Atanasov
Date: Tue Aug 02 2022 - 04:53:54 EST


Hi,

I put some more people on the CC, questions for you at the end , TIA.

On 01/08/2022 23:12, David Hildenbrand wrote:
/ # cat /sys/kernel/debug/virtio-balloon
inflated: -2097152 kB
What's the rationale of making it negative?

As suggested earlier indicate how the memory is accounted in the two different cases. Negative means it is subtracted from MemTotal . Positive means it is accounted as used .

To join the threads:

Always account inflated memory as used for both cases - with and
without deflate on oom. Do not change total ram which can confuse
userspace and users.
Sorry, but NAK.
Ok.

This would affect existing users / user space / balloon stats. For example
HV just recently switch to properly using adjust_managed_page_count()

I am wondering what's the rationale behind this i have never seen such users
that expect it to work like this. Do you have any pointers to such users, so
i can understood why they do so ?
We adjust total pages and managed pages to simulate what memory is
actually available to the system (just like during memory hot(un)plug).
Even though the pages are "allocated" by the driver, they are actually
unusable for the system, just as if they would have been offlined.
Strictly speaking, the guest OS can kill as many processes as it wants,
it cannot reclaim that memory, as it's logically no longer available.

There is nothing (valid, well, except driver unloading) the guest can do
to reuse these pages. The hypervisor has to get involved first to grant
access to some of these pages again (deflate the balloon).

It's different with deflate-on-oom: the guest will *itself* decide to
reuse inflated pages to deflate them. So the allocated pages can become
back usable easily. There was a recent discussion for virtio-balloon to
change that behavior when it's known that the hypervisor essentially
implements "deflate-on-oom" by looking at guest memory stats and
adjusting the balloon size accordingly; however, as long as we don't
know what the hypervisor does or doesn't do, we have to keep the
existing behavior.

Note that most balloon drivers under Linux share that behavior.

In case of Hyper-V I remember a customer BUG report that requested that
exact behavior, however, I'm not able to locate the BZ quickly.
[1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
(note that I can't easily find the original mail in the archives)

VMWare does not, Xen do, HV do (but it didn't) - Virtio does both.

For me the confusion comes from mixing ballooning and hot plug.

Ballooning is like a heap inside the guest from which the host can allocate/deallocate pages, if there is a mechanism for the guest to ask the host for more/to free/ pages or the host have a heuristic to monitor the guest and inflate/deflate the guest it is a matter of implementation.

Hot plug is adding  to MemTotal and it is not a random event either in real or virtual environment -  so you can act upon it. MemTotal  goes down on hot unplug and if pages get marked as faulty RAM.

Historically MemTotal is a stable value ( i agree with most of David Stevens points) and user space is expecting it to be stable , initialized at startup and it does not expect it to change.

Used is what changes and that is what user space expects to change.

Delfate on oom might have been a mistake but it is there and if anything depends on changing MemTotal  it will be broken by that option.  How that can be fixed?

I agree that the host can not reclaim what is marked as used  but should it be able to ? May be it will be good to teach oom killer that there can be such ram that can not be reclaimed.

Note: I suggested under [1] to expose inflated pages via /proc/meminfo
directly. We could do that consistently over all balloon drivers ...
doesn't sound too crazy.

Initally i wanted to do exactly this BUT:
- some drivers prefer to expose some more internal information in the file.
- a lot of user space is using meminfo so better keep it as is to avoid breaking something, ballooning is not very frequently used.


Please, share your view on how the ballooned memory should be accounted by the drivers inside the guests so we can work towards consistent behaviour:

Should the inflated memory be accounted as Used or MemTotal be adjusted?

Should the inflated memory be added to /proc/meminfo ?

--
Regards,
Alexander Atanasov