RE: [PATCH v5 0/8] vfio/hisilicon: add ACC live migration driver

From: Shameerali Kolothum Thodi
Date: Wed Feb 23 2022 - 10:54:03 EST




> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: 22 February 2022 19:30
> To: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-crypto@xxxxxxxxxxxxxxx; cohuck@xxxxxxxxxx; mgurtovoy@xxxxxxxxxx;
> yishaih@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; liulongfang
> <liulongfang@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; Wangzhou (B)
> <wangzhou1@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v5 0/8] vfio/hisilicon: add ACC live migration driver
>
> On Mon, 21 Feb 2022 20:49:43 -0400
> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> > On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote:
> > >
> > > Hi,
> > >
> > > This series attempts to add vfio live migration support for
> > > HiSilicon ACC VF devices based on the new v2 migration protocol
> > > definition and mlx5 v8 series discussed here[0].
> > >
> > > RFCv4 --> v5
> > > - Dropped RFC tag as v2 migration APIs are more stable now.
> > > - Addressed review comments from Jason and Alex (Thanks!).
> > >
> > > This is sanity tested on a HiSilicon platform using the Qemu branch
> > > provided here[1].
> > >
> > > Please take a look and let me know your feedback.
> > >
> > > Thanks,
> > > Shameer
> > > [0]
> https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@xxxxxxxxxx/
> > > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
> > >
> > >
> > > v3 --> RFCv4
> > > -Based on migration v2 protocol and mlx5 v7 series.
> > > -Added RFC tag again as migration v2 protocol is still under discussion.
> > > -Added new patch #6 to retrieve the PF QM data.
> > > -PRE_COPY compatibility check is now done after the migration data
> > >  transfer. This is not ideal and needs discussion.
> >
> > Alex, do you want to keep the PRE_COPY in just for acc for now? Or do
> > you think this is not a good temporary use for it?
> >
> > We have some work toward doing the compatability more generally, but I
> > think it will be a while before that is all settled.
>
> In the original migration protocol I recall that we discussed that
> using the pre-copy phase for compatibility testing, even without
> additional device data, as a valid use case. The migration driver of
> course needs to account for the fact that userspace is not required to
> perform a pre-copy, and therefore cannot rely on that exclusively for
> compatibility testing, but failing a migration earlier due to detection
> of an incompatibility is generally a good thing.
>
> If the ACC driver wants to re-incorporate this behavior into a non-RFC
> proposed series and we could align accepting them into the same kernel
> release, that sounds ok to me. Thanks,

Ok. I will add the support to PRE_COPY and check compatibility early.

From FSM arc point of view, I guess it is adding,

STATE_RUNNING --> STATE_PRE_COPY
create the saving file.
get_match_data();
return fd;

STATE_PRE_COPY --> STATE_STOP_COPY
stop_device()
get_device_data()
update the saving migf total_len;

resume_write()
check compatibility once we have enough bytes.

Also add support to IOCTL VFIO_DEVICE_MIG_PRECOPY.

I will have a go and sent out a revised one.

Thanks,
Shameer