RE: [PATCH v2 1/6] iommu/vt-d: Setup scalable mode context entry in probe path

From: Tian, Kevin
Date: Sun Dec 10 2023 - 23:06:39 EST


> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Saturday, December 9, 2023 3:53 PM
>
> On 12/8/23 4:50 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> >> Sent: Tuesday, December 5, 2023 9:22 AM
> >>
> >> @@ -304,6 +304,11 @@ int intel_pasid_setup_first_level(struct
> intel_iommu
> >> *iommu,
> >> return -EINVAL;
> >> }
> >>
> >> + if (intel_pasid_setup_sm_context(dev, true)) {
> >> + dev_err(dev, "Context entry is not configured\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> spin_lock(&iommu->lock);
> >> pte = intel_pasid_get_entry(dev, pasid);
> >> if (!pte) {
> >> @@ -384,6 +389,11 @@ int intel_pasid_setup_second_level(struct
> >> intel_iommu *iommu,
> >> return -EINVAL;
> >> }
> >>
> >> + if (intel_pasid_setup_sm_context(dev, true)) {
> >> + dev_err(dev, "Context entry is not configured\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> pgd = domain->pgd;
> >> agaw = iommu_skip_agaw(domain, iommu, &pgd);
> >> if (agaw < 0) {
> >> @@ -505,6 +515,11 @@ int intel_pasid_setup_pass_through(struct
> >> intel_iommu *iommu,
> >> u16 did = FLPT_DEFAULT_DID;
> >> struct pasid_entry *pte;
> >>
> >> + if (intel_pasid_setup_sm_context(dev, true)) {
> >> + dev_err(dev, "Context entry is not configured\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> spin_lock(&iommu->lock);
> >> pte = intel_pasid_get_entry(dev, pasid);
> >> if (!pte) {
> >
> > instead of replicating the invocation in all three stubs it's simpler to
> > do once in dmar_domain_attach_device() for all of them.
>
> It's not good to repeat the code. Perhaps we can add this check to
> intel_pasid_get_entry()? The rule is that you can't get the pasid entry
> if the context is copied.

You can add a check in intel_pasid_get_entry() but it's not a clean
code putting delayed setup inside it as we don't know when that
helper might be called. It's clearer to do delayed setup right in this
attach point.

> >
> >> +
> >> + /*
> >> + * Cache invalidation for changes to a scalable-mode context table
> >> + * entry.
> >> + *
> >> + * Section 6.5.3.3 of the VT-d spec:
> >> + * - Device-selective context-cache invalidation;
> >> + * - Domain-selective PASID-cache invalidation to affected domains
> >> + * (can be skipped if all PASID entries were not-present);
> >> + * - Domain-selective IOTLB invalidation to affected domains;
> >> + * - Global Device-TLB invalidation to affected functions.
> >> + *
> >> + * For kdump cases, old valid entries may be cached due to the
> >> + * in-flight DMA and copied pgtable, but there is no unmapping
> >> + * behaviour for them, thus we need explicit cache flushes for all
> >> + * affected domain IDs and PASIDs used in the copied PASID table.
> >> + * Given that we have no idea about which domain IDs and PASIDs
> >> were
> >> + * used in the copied tables, upgrade them to global PASID and IOTLB
> >> + * cache invalidation.
> >> + *
> >> + * For kdump case, at this point, the device is supposed to finish
> >> + * reset at its driver probe stage, so no in-flight DMA will exist,
> >> + * and we don't need to worry anymore hereafter.
> >> + */
> >> + if (context_copied(iommu, bus, devfn)) {
> >> + context_clear_entry(context);
> >> + clear_context_copied(iommu, bus, devfn);
> >> + iommu->flush.flush_context(iommu, 0,
> >> + (((u16)bus) << 8) | devfn,
> >> + DMA_CCMD_MASK_NOBIT,
> >> + DMA_CCMD_DEVICE_INVL);
> >> + qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0);
> >> + iommu->flush.flush_iotlb(iommu, 0, 0, 0,
> >> DMA_TLB_GLOBAL_FLUSH);
> >> + devtlb_invalidation_with_pasid(iommu, dev,
> >> IOMMU_NO_PASID);
> >> + }
> >
> > I don't see this logic from existing code. If it's a bug fix then
> > please send it separately first.
>
> This code originates from domain_context_mapping_one(). It's not a bug
> fix.

but it's not the same flow:

/*
* For kdump cases, old valid entries may be cached due to the
* in-flight DMA and copied pgtable, but there is no unmapping
* behaviour for them, thus we need an explicit cache flush for
* the newly-mapped device. For kdump, at this point, the device
* is supposed to finish reset at its driver probe stage, so no
* in-flight DMA will exist, and we don't need to worry anymore
* hereafter.
*/
if (context_copied(iommu, bus, devfn)) {
u16 did_old = context_domain_id(context);

if (did_old < cap_ndoms(iommu->cap)) {
iommu->flush.flush_context(iommu, did_old,
(((u16)bus) << 8) | devfn,
DMA_CCMD_MASK_NOBIT,
DMA_CCMD_DEVICE_INVL);
iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
DMA_TLB_DSI_FLUSH);
}

clear_context_copied(iommu, bus, devfn);
}

>
> >> +
> >> + context_entry_set_pasid_table(context, dev);
> >
> > and here is additional change to the context entry. Why is the
> > context cache invalidated in the start?
>
> The previous context entry may be copied from a previous kernel.
> Therefore, we need to tear down the entry and flush the caches before
> reusing it.

there is no reuse before this function returns. then why not doing just
one flush in the end?