Re: [RFC V1 00/13] vdpa live update

From: Steven Sistare
Date: Wed Jan 17 2024 - 15:33:02 EST


On 1/10/2024 9:55 PM, Jason Wang wrote:
> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare <steven.sistare@xxxxxxxxxx> wrote:
>>
>> Live update is a technique wherein an application saves its state, exec's
>> to an updated version of itself, and restores its state. Clients of the
>> application experience a brief suspension of service, on the order of
>> 100's of milliseconds, but are otherwise unaffected.
>>
>> Define and implement interfaces that allow vdpa devices to be preserved
>> across fork or exec, to support live update for applications such as qemu.
>> The device must be suspended during the update, but its dma mappings are
>> preserved, so the suspension is brief.
>>
>> The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
>> accounting from one process to another.
>>
>> The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
>> VHOST_NEW_OWNER is supported.
>>
>> The VHOST_IOTLB_REMAP message type updates a dma mapping with its userland
>> address in the new process.
>>
>> The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
>> VHOST_IOTLB_REMAP is supported and required. Some devices do not
>> require it, because the userland address of each dma mapping is discarded
>> after being translated to a physical address.
>>
>> Here is a pseudo-code sequence for performing live update, based on
>> suspend + reset because resume is not yet available. The vdpa device
>> descriptor, fd, remains open across the exec.
>>
>> ioctl(fd, VHOST_VDPA_SUSPEND)
>> ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
>> exec
>
> Is there a userspace implementation as a reference?

I have working patches for qemu that use these ioctl's, but they depend on other
qemu cpr patches that are a work in progress, and not posted yet. I'm working on
that.

>> ioctl(fd, VHOST_NEW_OWNER)
>>
>> issue ioctls to re-create vrings
>>
>> if VHOST_BACKEND_F_IOTLB_REMAP
>> foreach dma mapping
>> write(fd, {VHOST_IOTLB_REMAP, new_addr})
>
> I think I need to understand the advantages of this approach. For
> example, why it is better than
>
> ioctl(VHOST_RESET_OWNER)
> exec
>
> ioctl(VHOST_SET_OWNER)
>
> for each dma mapping
> ioctl(VHOST_IOTLB_UPDATE)

That is slower. VHOST_RESET_OWNER unbinds physical pages, and VHOST_IOTLB_UPDATE
rebinds them. It costs multiple seconds for large memories, and is incurred during the
virtual machine's pause time during live update. For comparison, the total pause time
for live update with vfio interfaces is ~100 millis.

However, the interaction with userland is so similar that the same code paths can be used.
In my qemu prototype, after cpr exec's new qemu:
- vhost_vdpa_set_owner() calls VHOST_NEW_OWNER instead of VHOST_SET_OWNER
- vhost_vdpa_dma_map() sets type VHOST_IOTLB_REMAP instead of VHOST_IOTLB_UPDATE

- Steve