Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

From: Robin Murphy
Date: Fri Oct 22 2021 - 09:39:53 EST


On 2021-10-22 09:06, Marc Zyngier wrote:
On Fri, 22 Oct 2021 03:52:38 +0100,
Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:

On 10/21/21 4:10 PM, Marc Zyngier wrote:
On Thu, 21 Oct 2021 03:22:30 +0100,
Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:

On 10/20/21 10:22 PM, Marc Zyngier wrote:
On Wed, 20 Oct 2021 06:21:44 +0100,
Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:

On 2021/10/20 0:37, Sven Peter via iommu wrote:
+ /*
+ * Check that CPU pages can be represented by the IOVA granularity.
+ * This has to be done after ops->attach_dev since many IOMMU drivers
+ * only limit domain->pgsize_bitmap after having attached the first
+ * device.
+ */
+ ret = iommu_check_page_size(domain);
+ if (ret) {
+ __iommu_detach_device(domain, dev);
+ return ret;
+ }

It looks odd. __iommu_attach_device() attaches an I/O page table for a
device. How does it relate to CPU pages? Why is it a failure case if CPU
page size is not covered?

If you allocate a CPU PAGE_SIZE'd region, and point it at a device
that now can DMA to more than what you have allocated because the
IOMMU's own page size is larger, the device has now access to data it
shouldn't see. In my book, that's a pretty bad thing.

But even you enforce the CPU page size check here, this problem still
exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?

Let me take a CPU analogy: you have a page that contains some user
data *and* a kernel secret. How do you map this page into userspace
without leaking the kernel secret?

PAGE_SIZE allocations are the unit of isolation, and this applies to
both CPU and IOMMU. If you have allocated a DMA buffer that is less
than a page, you then have to resort to bounce buffering, or accept
that your data isn't safe.

I can understand the problems when IOMMU page sizes is larger than CPU
page size. But the code itself is not clean. The vendor iommu drivers
know more hardware details than the iommu core. It looks odd that the
vendor iommu says "okay, I can attach this I/O page table to the
device", but the iommu core says "no, you can't" and rolls everything
back.

If your IOMMU driver can do things behind the core's back and
contradict the view that the core has, then it is probably time to fix
your IOMMU driver and make the core aware of what is going on.
Supported page sizes is one of these things.

In general, keeping the IOMMU driver as dumb as possible is a worthy
design goal, and this is why we have these abstractions.

In this case it's the abstractions that are the problem, though. Any driver which supports heterogeneous IOMMU instances with potentially differing page sizes currently has no choice but to do horrible bodges to make the bus-based iommu_domain_alloc() paradigm work *at all*. Fixing that from the fundamental API level upwards has been on the to-do list for some time now, but won't be straightforward.

Robin.