RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

From: Parav Pandit
Date: Wed Sep 20 2023 - 03:11:08 EST



> From: Zhu, Lingshan <lingshan.zhu@xxxxxxxxx>
> Sent: Wednesday, September 20, 2023 12:37 PM

> > The problem to overcome in [1] is, resume operation needs to be synchronous
> as it involves large part of context to resume back, and hence just
> asynchronously setting DRIVER_OK is not enough.
> > The sw must verify back that device has resumed the operation and ready to
> answer requests.
> this is not live migration, all device status and other information still stay in the
> device, no need to "resume" context, just resume running.
>
I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got to verify before accessing more from the device.

> Like resume from a failed LM.
> >
> > This is slightly different flow than setting the DRIVER_OK for the first time
> device initialization sequence as it does not involve large restoration.
> >
> > So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >
> > Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
>
Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's synchronous new register..