Re: [PATCH] virtio_balloon: Fix endless deflation and inflation on arm64

From: Gavin Shan
Date: Wed Aug 30 2023 - 20:56:32 EST


On 8/31/23 02:30, David Hildenbrand wrote:
On 29.08.23 03:54, Gavin Shan wrote:
The deflation request to the target, which isn't unaligned to the
guest page size causes endless deflation and inflation actions. For
example, we receive the flooding QMP events for the changes on memory
balloon's size after a deflation request to the unaligned target is
sent for the ARM64 guest, where we have 64KB base page size.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
   -accel kvm -machine virt,gic-version=host -cpu host          \
   -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
   -m 1024M,slots=16,maxmem=64G                                 \
   -object memory-backend-ram,id=mem0,size=512M                 \
   -object memory-backend-ram,id=mem1,size=512M                 \
   -numa node,nodeid=0,memdev=mem0,cpus=0-3                     \
   -numa node,nodeid=1,memdev=mem1,cpus=4-7                     \
     :                                                          \
   -device virtio-balloon-pci,id=balloon0,bus=pcie.10

   { "execute" : "balloon", "arguments": { "value" : 1073672192 } }
   {"return": {}}
   {"timestamp": {"seconds": 1693272173, "microseconds": 88667},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272174, "microseconds": 89704},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272175, "microseconds": 90819},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272176, "microseconds": 91961},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272177, "microseconds": 93040},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
   {"timestamp": {"seconds": 1693272178, "microseconds": 94117},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
   {"timestamp": {"seconds": 1693272179, "microseconds": 95337},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272180, "microseconds": 96615},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
   {"timestamp": {"seconds": 1693272181, "microseconds": 97626},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272182, "microseconds": 98693},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
   {"timestamp": {"seconds": 1693272183, "microseconds": 99698},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272184, "microseconds": 100727},  \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272185, "microseconds": 90430},   \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   {"timestamp": {"seconds": 1693272186, "microseconds": 102999},  \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}}
      :
   <The similar QMP events repeat>

Fix it by having the target aligned to the guest page size, 64KB
in this specific case. With this applied, no flooding QMP event
is observed and the memory balloon's size can be stablizied to
0x3ffe0000 soon after the deflation request is sent.

   { "execute" : "balloon", "arguments": { "value" : 1073672192 } }
   {"return": {}}
   {"timestamp": {"seconds": 1693273328, "microseconds": 793075},  \
    "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}}
   { "execute" : "query-balloon" }
   {"return": {"actual": 1073610752}}

Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
---
  drivers/virtio/virtio_balloon.c | 13 ++++++++++++-
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5b15936a5214..625caac35264 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -386,6 +386,17 @@ static void stats_handle_request(struct virtio_balloon *vb)
      virtqueue_kick(vq);
  }
+static inline s64 align_pages_up(s64 diff)
+{
+    if (diff == 0)
+        return diff;
+
+    if (diff > 0)
+        return ALIGN(diff, VIRTIO_BALLOON_PAGES_PER_PAGE);
+
+    return -ALIGN(-diff, VIRTIO_BALLOON_PAGES_PER_PAGE);
+}
+
  static inline s64 towards_target(struct virtio_balloon *vb)
  {
      s64 target;
@@ -396,7 +407,7 @@ static inline s64 towards_target(struct virtio_balloon *vb)
              &num_pages);
      target = num_pages;
-    return target - vb->num_pages;

We know that vb->num_pages is always multiples of VIRTIO_BALLOON_PAGES_PER_PAGE.

Why not simply align target down?

target = ALIGN(num_pages, VIRTIO_BALLOON_PAGES_PER_PAGE);
return target - vb->num_pages;


Good point. Thanks a lot, David. The code will be changed to what's suggested in
v2, to be posted soon. I will also add a comment to explain it a bit. Besides, ALIGN()
is align-up instead of align-down to give bias to deflation intentionally, to avoid
overrunning the machine's memory size if it's not aligned to 64KB. Further more,
the align-up causes deflation even user requests a 4KB diff. However, the outcome
of ALIGN_DOWN(4KB, 64KB) is zero and no deflation will be triggered.

Thanks,
Gavin