Re: [PATCH] intel-iommu: Manage iommu_coherency globally

From: Chris Wright
Date: Sat Nov 19 2011 - 14:18:13 EST


* David Woodhouse (dwmw2@xxxxxxxxxxxxx) wrote:
> On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote:
> > We currently manage iommu_coherency on a per domain basis,
> > choosing the safest setting across the iommus attached to a
> > particular domain. This unfortunately has a bug that when
> > no iommus are attached, the domain defaults to coherent.
> > If we fall into this mode, then later add a device behind a
> > non-coherent iommu to that domain, the context entry is
> > updated using the wrong coherency setting, and we get dmar
> > faults.
> >
> > Since we expect chipsets to be consistent in their coherency
> > setting, we can instead determine the coherency once and use
> > it globally.
>
> (Adding Rajesh).
>
> Hm, it seems I lied to you about this. The non-coherent mode isn't just
> a historical mistake; it's configurable by the BIOS, and we actually
> encourage people to use the non-coherent mode because it makes the
> hardware page-walk faster â so reduces the latency for IOTLB misses.

Interesting because for the workloads I've tested it's the exact opposite.
Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent
access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU
added smth like 10% when non-coherent vs. coherent.

> In addition to that, the IOMMU associated with the integrated graphics
> is so "special" that it doesn't support coherent mode either. So it *is*
> quite feasible that we'll see a machine where some IOMMUs support
> coherent mode, and some don't.
>
> And thus we do need to address the concern that just assuming
> non-coherent mode will cause unnecessary performance issues, for the
> case where a domain *doesn't* happen to include any of the non-coherent
> IOMMUs.
>
> However... for VM domains I don't think we care. Setting up the page
> tables *isn't* a fast path there (at least not until/unless we support
> exposing an emulated IOMMU to the guest).
>
> The case we care about is *native* DMA, where this cache flush will be
> happening for example in the fast path of network TX/RX. But in *that*
> case, there is only *one* IOMMU to worry about so it's simple enough to
> do the right thing, surely?

Definitely agreed on the above points, limited page table setup/teardown
to VMs and bare metal case is sensitive to IOMMU overhead of flushing.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/