Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver

From: Joao Martins
Date: Mon Feb 14 2022 - 08:34:53 EST


On 2/12/22 00:01, Jason Gunthorpe wrote:
> On Fri, Feb 11, 2022 at 09:43:56PM +0000, Joao Martins wrote:
>> 1. Decodes the read and write intent from the memory access.
>> 2. If P=0 in the page descriptor, fail the access.
>> 3. Compare the A & D bits in the descriptor with the read and write intent in the request.
>> 4. If the A or D bits need to be updated in the descriptor:
>
> Ah, so the dirty update is actually atomic on the first write before
> any DMA happens - and I suppose all of this happens when the entry is
> first loaded into the IOTLB.
>
Intel's equivalent feature suggests me that this works the same way there.

ARM update of ioptes looks to be slightly different[*] but this FEAT_BBM
in SMMUv3.2 makes it work in similar way to Intel/AMD. But I could be
misunderstanding something there.

[*] apparently we need to write an invalid entry first, invalidate the {IO}TLB
and then write the new valid entry. Not sure I understood correctly that this
is the 'break-before-make' thingie.

> So the flush is to allow the IOTLB to see the cleared D bit..
>
Right.

>>> split/collapse seems kind of orthogonal to me it doesn't really
>>> connect to dirty tracking other than being mostly useful during dirty
>>> tracking.
>>>
>>> And I wonder how hard split is when trying to atomically preserve any
>>> dirty bit..
>>>
>> Would would it be hard? The D bit is supposed to be replicated when you
>> split to smaller page size.
>
> I guess it depends on how the 'acquire' is done, as the CPU has to
> atomically replace a large entry with a pointer to a small entry,
> flush the IOTLB then 'acquire' the dirty bit. If the dirty bit is set
> in the old entry then it has to sprinkle it into the new entries with
> atomics.
>
ISTR some mention to what we are chatting here in the IOMMU SDM:

When a non-default page size is used , software must OR the Dirty bits in
all of the replicated host PTEs used to map the page. The IOMMU does not
guarantee the Dirty bits are set in all of the replicated PTEs. Any portion
of the page may have been written even if the Dirty bit is set in only one
of the replicated PTEs.

>> I wonder if we could start progressing the dirty tracking as a first initial series and
>> then have the split + collapse handling as a second part? That would be quite
>> nice to get me going! :D
>
> I think so, and I think we should. It is such a big problem space, it
> needs to get broken up.

OK, cool! I'll stick with the same (slimmed down) IOMMU+VFIO interface as proposed in the
past except with the x86 support only[*]. And we poke holes there I guess.

[*] I might include Intel too, albeit emulated only.