Re: [PATCH] virtio_balloon: stay awake while adjusting balloon

From: David Stevens
Date: Wed Dec 13 2023 - 23:14:06 EST


On Wed, Dec 13, 2023 at 5:44 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 11.12.23 12:43, David Stevens wrote:
> > From: David Stevens <stevensd@xxxxxxxxxxxx>
> >
>
> Hi David,
>
> > Add a wakeup event for when the balloon is inflating or deflating.
> > Userspace can enable this wakeup event to prevent the system from
> > suspending while the balloon is being adjusted. This allows
> > /sys/power/wakeup_count to be used without breaking virtio_balloon's
> > cooperative memory management.
>
> Can you add/share some more details

I'm working on enabling support for Linux s2Idle in our Android
virtual machine, to restrict apps from running in the background
without holding an Android partial wakelock. With the patch I recently
sent out [1], since crosvm advertises native PCI power management for
virtio devices, the Android guest can properly enter s2idle, and it
can be woken up by incoming IO. However, one of the remaining problems
is that when the host needs to reclaim memory from the guest via the
virtio-balloon, there is nothing preventing the guest from entering
s2idle before the balloon driver finishes returning memory to the
host.

One alternative to this approach would be to add a virtballoon_suspend
callback to abort suspend if the balloon is inflating/adjusting.
However, it seems cleaner to just prevent suspend in the first place.

[1] https://lore.kernel.org/lkml/20231208070754.3132339-1-stevensd@xxxxxxxxxxxx/

> >
> > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx>
> > ---
> > drivers/virtio/virtio_balloon.c | 59 +++++++++++++++++++++++++++------
> > 1 file changed, 49 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index 1fe93e93f5bc..811d8937246a 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -119,6 +119,11 @@ struct virtio_balloon {
> > /* Free page reporting device */
> > struct virtqueue *reporting_vq;
> > struct page_reporting_dev_info pr_dev_info;
> > +
> > + /* State for keeping the wakeup_source active while adjusting the balloon */
> > + spinlock_t adjustment_lock;
> > + u32 adjustment_seqno;
>
> Using a simple flag that gets set when updating the balloon size and
> test-and-clear when testing for changes should be easier to get.
>
> bool adjustment_balloon_size_changed;
>
> or sth like that.

That's a good simplification, thanks.

> > + bool adjustment_in_progress;
> > };
> >
> > static const struct virtio_device_id id_table[] = {
> > @@ -437,6 +442,31 @@ static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
> > queue_work(vb->balloon_wq, &vb->report_free_page_work);
> > }
> >
> > +static void start_update_balloon_size(struct virtio_balloon *vb)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&vb->adjustment_lock, flags);
> > + vb->adjustment_seqno++;
> > + if (!vb->adjustment_in_progress) {
> > + vb->adjustment_in_progress = true;
> > + pm_stay_awake(&vb->vdev->dev);
> > + }
> > + spin_unlock_irqrestore(&vb->adjustment_lock, flags);
> > +
> > + queue_work(system_freezable_wq, &vb->update_balloon_size_work);
> > +}
> > +
> > +static void end_update_balloon_size(struct virtio_balloon *vb, u32 seqno)
> > +{
> > + spin_lock(&vb->adjustment_lock);
> > + if (vb->adjustment_seqno == seqno && vb->adjustment_in_progress) {
> > + vb->adjustment_in_progress = false;
> > + pm_relax(&vb->vdev->dev);
> > + }
> > + spin_unlock(&vb->adjustment_lock);
> > +}
> > +
> > static void virtballoon_changed(struct virtio_device *vdev)
> > {
> > struct virtio_balloon *vb = vdev->priv;
> > @@ -444,8 +474,7 @@ static void virtballoon_changed(struct virtio_device *vdev)
> >
> > spin_lock_irqsave(&vb->stop_update_lock, flags);
> > if (!vb->stop_update) {
> > - queue_work(system_freezable_wq,
> > - &vb->update_balloon_size_work);
> > + start_update_balloon_size(vb);
> > virtio_balloon_queue_free_page_work(vb);
> > }
> > spin_unlock_irqrestore(&vb->stop_update_lock, flags);
> > @@ -473,22 +502,29 @@ static void update_balloon_size_func(struct work_struct *work)
> > {
> > struct virtio_balloon *vb;
> > s64 diff;
> > + u32 seqno;
> >
> > vb = container_of(work, struct virtio_balloon,
> > update_balloon_size_work);
> > - diff = towards_target(vb);
> >
> > - if (!diff)
> > - return;
> > + spin_lock(&vb->adjustment_lock);
> > + seqno = vb->adjustment_seqno;
> > + spin_unlock(&vb->adjustment_lock);
> >
> > - if (diff > 0)
> > - diff -= fill_balloon(vb, diff);
> > - else
> > - diff += leak_balloon(vb, -diff);
> > - update_balloon_size(vb);
> > + diff = towards_target(vb);
> > +
> > + if (diff) {
> > + if (diff > 0)
> > + diff -= fill_balloon(vb, diff);
> > + else
> > + diff += leak_balloon(vb, -diff);
> > + update_balloon_size(vb);
> > + }
> >
> > if (diff)
> > queue_work(system_freezable_wq, work);
> > + else
> > + end_update_balloon_size(vb, seqno);
>
> What if we stop the workqueue and unload the driver -- see
> remove_common() -- won't you leave pm_stay_awake() wrongly set?

When a device gets removed, its wakeup source is destroyed, which
automatically calls __pm_relax.

> > }
> >
> > static int init_vqs(struct virtio_balloon *vb)
> > @@ -992,6 +1028,9 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > goto out_unregister_oom;
> > }
> >
> > + spin_lock_init(&vb->adjustment_lock);
> > + device_set_wakeup_capable(&vb->vdev->dev, true);
>
>
> I'm a bit confused: Documentation/driver-api/pm/devices.rst documents
>
> "
> The :c:member:`power.can_wakeup` flag just records whether the device
> (and its driver) can physically support wakeup events. The
> :c:func:`device_set_wakeup_capable()` routine affects this flag.
> "
>
> ...
>
> "
> Whether or not a device is capable of issuing wakeup events is a
> hardware matter, and the kernel is responsible for keeping track of it.
> "
>
> But how is the virtio-balloon device capable of waking up the machine?
> Your patch merely implies that the virtio-baloon device is capable to
> prohbit going to sleep.
>
> What am I missing?

The underlying virtio_pci_device is capable of waking up the machine,
if it supports PCI power management. The core PCI code will keep the
machine awake while processing the interrupt (i.e. during
virtballoon_changed), but after processing is handed off to the
virtio-balloon driver, the virtio-balloon driver needs to keep the
machine awake until the processing is actually completed.

An alternative to making vb->vdev->dev wakeup capable is to plumb the
pm_stay_awake/pm_relax calls to the underlying virtio_pci_device. Would
that be a preferable approach?

-David