Re: [PATCH v2 4/7] iommu: Let iommu.strict override ops->def_domain_type

From: Robin Murphy
Date: Mon Nov 28 2022 - 11:56:53 EST


On 2022-11-28 15:54, Niklas Schnelle wrote:
On Mon, 2022-11-28 at 09:29 -0400, Jason Gunthorpe wrote:
On Mon, Nov 28, 2022 at 12:10:39PM +0100, Niklas Schnelle wrote:
On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote:
On 2022/11/17 1:16, Niklas Schnelle wrote:
When iommu.strict=1 is set or iommu_set_dma_strict() was called we
should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type.

Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
---
drivers/iommu/iommu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 65a3b3d886dc..d9bf94d198df 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
+ if (iommu_dma_strict)
+ return IOMMU_DOMAIN_DMA;

If any quirky device must work in IOMMU identity mapping mode, this
might introduce functional regression. At least for VT-d platforms, some
devices do require IOMMU identity mapping mode for functionality.

That's a good point. How about instead of unconditionally returning
IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops-
def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit
is set). That way a device that only supports identity mapping gets to
set that but iommu_dma_strict at least always prevents use of an IOVA
flush queue.

I would prefer we create some formal caps in iommu_ops to describe
whatever it is you are trying to do.

Jason

I agree that there is currently a lack of distinction between what
domain types can be used (capability) and which should be used as
default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA,
ops->def_domain_type.).

As far as I'm concerned, the purpose of .def_domain_type is really just for quirks where the device needs an identity mapping, based on knowledge that tends to be sufficiently platform-specific that we prefer to delegate it to drivers. What apple-dart is doing is really just a workaround for not being to indicate per-instance domain type support at the point of the .domain_alloc call, and IIRC what mtk_iommu_v1 is doing is a horrible giant hack around the arch/arm DMA ops that don't understand IOMMU groups. Both of those situations are on the cards to be cleaned up, so don't take too much from them.

My case though is about the latter which I think has some undocumented
and surprising precedences built in at the moment. With this series we
can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want
to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on
whether we're running in a paging hypervisor (z/VM or KVM) to get the
best performance. From a semantic point of view I felt that this is a
good match for ops->def_domain_type in that we pick a default but it's
still possible to change the domain type e.g. via sysfs. Now this had
the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ
to be used even if iommu_set_dma_strict() was called (via
iommu.strict=1) because there is a undocumented override of ops-
def_domain_type over iommu_def_domain_type which I believe comes from
the mixing of capability and default you also point at.

I think ideally we need two separate mechanism to determine which
domain types can be used for a particular device (capability) and for
which one to default to with the latter part having a clear precedence
between the options. Put together I think iommu.strict=1 should
override a device's preference (ops->def_domain_type) and
CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if
the device is capable of that. Does that sound reasonable?

That sounds like essentially what we already have, though. The current logic should be thus:

1: If the device is untrusted, it gets strict translation, nothing else. If that won't actually work, tough.
2: If .def_domain_type returns a specific type, it is because any other type will not work correctly at all, so we must use that.
3: Otherwise, we compute the user's preferred default type based on kernel config and command line options.
4: Then we determine whether the IOMMU driver actually supports that type, by trying to allocate it. If allocation fails and the preferred type was more relaxed than IOMMU_DOMAIN_DMA, fall back to the stricter type and try one last time.

AFAICS the distinction and priority of those steps is pretty clear:

1: Core requirements
2: Driver-specific requirements
3: Core preference
4: Driver-specific support

Now, for step 4 we *could* potentially use static capability flags in place of the "try allocating different things until one succeeds", but that doesn't change anything other than saving the repetitive boilerplate in everyone's .domain_alloc implementations. The real moral of the story here is not to express a soft preference where it will be interpreted as a hard requirement.

Thanks,
Robin.