Re: [PATCH v2] virtio_balloon: Fix endless deflation and inflation on arm64

From: David Hildenbrand
Date: Mon Oct 02 2023 - 07:51:36 EST


On 25.09.23 01:58, Gavin Shan wrote:
Hi David and Michael,

On 8/31/23 11:10, Gavin Shan wrote:
The deflation request to the target, which isn't unaligned to the
guest page size causes endless deflation and inflation actions. For
example, we receive the flooding QMP events for the changes on memory
balloon's size after a deflation request to the unaligned target is
sent for the ARM64 guest, where we have 64KB base page size.

/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host -cpu host \
-smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
-m 1024M,slots=16,maxmem=64G \
-object memory-backend-ram,id=mem0,size=512M \
-object memory-backend-ram,id=mem1,size=512M \
-numa node,nodeid=0,memdev=mem0,cpus=0-3 \
-numa node,nodeid=1,memdev=mem1,cpus=4-7 \
: \
-device virtio-balloon-pci,id=balloon0,bus=pcie.10

{ "execute" : "balloon", "arguments": { "value" : 1073672192 } }
{"return": {}}
{"timestamp": {"seconds": 1693272173, "microseconds": 88667}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272174, "microseconds": 89704}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272175, "microseconds": 90819}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272176, "microseconds": 91961}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272177, "microseconds": 93040}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
{"timestamp": {"seconds": 1693272178, "microseconds": 94117}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
{"timestamp": {"seconds": 1693272179, "microseconds": 95337}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272180, "microseconds": 96615}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
{"timestamp": {"seconds": 1693272181, "microseconds": 97626}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272182, "microseconds": 98693}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
{"timestamp": {"seconds": 1693272183, "microseconds": 99698}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272184, "microseconds": 100727}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272185, "microseconds": 90430}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{"timestamp": {"seconds": 1693272186, "microseconds": 102999}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
:
<The similar QMP events repeat>

Fix it by aligning the target up to the guest page size, 64KB in this
specific case. With this applied, no flooding QMP events are observed
and the memory balloon's size can be stablizied to 0x3ffe0000 soon
after the deflation request is sent.

{ "execute" : "balloon", "arguments": { "value" : 1073672192 } }
{"return": {}}
{"timestamp": {"seconds": 1693273328, "microseconds": 793075}, \
"event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
{ "execute" : "query-balloon" }
{"return": {"actual": 1073610752}}

Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
Tested-by: Zhenyu Zhang <zhenyzha@xxxxxxxxxx>
---
v2: Align @num_pages up to the guest page size in towards_target()
directly as David suggested.
---
drivers/virtio/virtio_balloon.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)


If the patch looks good, could you please merge this to Linux 6.6.rc4 since
it's something needed by our downstream. I hope it can land upstream as early
as possible, thanks a lot.

@MST, I cannot spot it in your usual vhost git yet. Should I pick it up or what are your plans?

--
Cheers,

David / dhildenb