RE: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver

From: Shameerali Kolothum Thodi
Date: Wed Feb 02 2022 - 11:10:18 EST




> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@xxxxxxxxxx]
> Sent: 02 February 2022 15:40
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
> Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-crypto@xxxxxxxxxxxxxxx; alex.williamson@xxxxxxxxxx;
> mgurtovoy@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; liulongfang
> <liulongfang@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> yuzenghui <yuzenghui@xxxxxxxxxx>; Jonathan Cameron
> <jonathan.cameron@xxxxxxxxxx>; Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx>
> Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver
>
> On Wed, Feb 02, 2022 at 02:34:52PM +0000, Shameerali Kolothum Thodi
> wrote:
>
> > > There are few topics to consider:
> > > - Which of the three feature sets (STOP_COPY, P2P and PRECOPY) make
> > > sense for this driver?
> >
> > I think it will be STOP_COPY only for now. We might have PRECOPY
> > feature once we have the SMMUv3 HTTU support in future.
>
> HTTU is the dirty tracking feature? To be clear VFIO migration support for
> PRECOPY has nothing to do with IOMMU based dirty page tracking.

Yes, it is based on the IOMMU hardware dirty bit management support.
A RFC was posted sometime back,
https://lore.kernel.org/kvm/20210507103608.39440-1-zhukeqian1@xxxxxxxxxx/

Ok, my guess was that the PRECOPY here was related. Thanks for clarifying.

>
> > > - I think we discussed the P2P implementation and decided it would
> > > work for this device? Can you re-read and confirm?
> >
> > In our case these devices are Integrated End Point devices and doesn't
> > have P2P DMA capability. Hence the FSM arcs will be limited to
> > STOP_COPY feature I guess. Also, since we cannot guarantee a NDMA
> > state in STOP, my assumption currently is the onus of making sure that
> > no MMIO access happens in STOP is on the user. Is that a valid assumption?
>
> Yes, you can treat RUNNING_P2P as the same as STOP and rely on no MMIO
> access to sustain it.

Ok.

> (and I'm wondering sometimes if we should rename RUNNING_P2P to
> STOP_P2P - ie the device is stopped but still allows inbound P2P to make this
> clearer)
>
> > Do we need to set the below before the feature query?
> > Or am I using a wrong Qemu/kernel repo?
> >
> > +++ b/hw/vfio/migration.c
> > @@ -488,6 +488,7 @@ static int vfio_migration_query_flags(VFIODevice
> > *vbasedev, uint64_t *mig_flags)
> > struct vfio_device_feature_migration *mig = (void
> > *)feature->data;
> >
> > feature->argsz = sizeof(buf);
> > + feature->flags = VFIO_DEVICE_FEATURE_MIGRATION |
> > + VFIO_DEVICE_FEATURE_GET;
> > if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature) != 0)
> > return -EOPNOTSUPP;
>
> Oh, this is my mistake I thought this got pushed to that github already but
> didn't, I updated it.

Ok. Thanks.

> If you have a prototype can you post another RFC?

Sure, will do. I just started and has only a skeleton proto based on v2 now.
Will send out a RFC soon once I have all the FSM arcs implemented
and sanity tested.

Thanks,
Shameer