Re: [PATCH v2] iommu/arm-smmu: Break insecure users by disabling bypass by default

From: Robin Murphy
Date: Fri Oct 04 2019 - 12:36:20 EST


On 04/10/2019 16:23, Tim Harvey wrote:
On Thu, Oct 3, 2019 at 3:24 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:

On 2019-10-03 9:51 pm, Tim Harvey wrote:
On Thu, Oct 3, 2019 at 1:42 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:

Hi Tim,

On 2019-10-03 7:27 pm, Tim Harvey wrote:
On Fri, Mar 1, 2019 at 11:21 AM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:

If you're bisecting why your peripherals stopped working, it's
probably this CL. Specifically if you see this in your dmesg:
Unexpected global fault, this could be serious
...then it's almost certainly this CL.

Running your IOMMU-enabled peripherals with the IOMMU in bypass mode
is insecure and effectively disables the protection they provide.
There are few reasons to allow unmatched stream bypass, and even fewer
good ones.

This patch starts the transition over to make it much harder to run
your system insecurely. Expected steps:

1. By default disable bypass (so anyone insecure will notice) but make
it easy for someone to re-enable bypass with just a KConfig change.
That's this patch.

2. After people have had a little time to come to grips with the fact
that they need to set their IOMMUs properly and have had time to
dig into how to do this, the KConfig will be eliminated and bypass
will simply be disabled. Folks who are truly upset and still
haven't fixed their system can either figure out how to add
'arm-smmu.disable_bypass=n' to their command line or revert the
patch in their own private kernel. Of course these folks will be
less secure.

Suggested-by: Robin Murphy <robin.murphy@xxxxxxx>
Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
---

Hi Doug / Robin,

I ran into this breaking things on OcteonTx boards based on CN80XX
CPU. The IOMMU configuration is a bit beyond me and I'm hoping you can
offer some advice. The IOMMU here is cavium,smmu-v2 as defined in
https://github.com/Gateworks/dts-newport/blob/master/cn81xx-linux.dtsi

Booting with 'arm-smmu.disable_bypass=n' does indeed work around the
breakage as the commit suggests.

Any suggestions for a proper fix?

Ah, you're using the old "mmu-masters" binding (and in a way which isn't
well-defined - it's never been specified what the stream ID argument(s)
would mean for a PCI host bridge, and Linux just ignores them). The
ideal thing would be to update the DT to generic "iommu-map" properties
- it's been a long time since I last played with a ThunderX, but I
believe the SMMU stream IDs should just be the same as the ITS device
IDs (which is how the "mmu-masters" mapping would have played out anyway).

The arm-smmu driver support for the old binding has always relied on
implicit bypass - there are technical reasons why we can't realistically
support the full functionality offered to the generic bindings, but it
would be possible to add some degree of workaround to prevent it
interacting quite so poorly with disable_bypass, if necessary. Do you
have deployed systems with DTs that can't be updated, but still might
need to run new kernels?


Robin,

Thanks for the response. I don't care too much about supporting new
kernels with the current DT - I'm good with fixing this with a DT
change. Would you be able to give me an example? I would love to see
Cavium mainline an cn81xx dts/dtsi in arch/arm64/boot/dts to be used
as a base as the only thing we have to go off of currently is the
Cavium SDK which has fairly old kernel support.

No promises (it's a late-night hack from my sofa), but try giving this a
go...

Robin.

----->8-----
diff --git a/cn81xx-linux.dtsi b/cn81xx-linux.dtsi
index 3b759d9575fe..dabc9047c674 100644
--- a/cn81xx-linux.dtsi
+++ b/cn81xx-linux.dtsi
@@ -234,7 +234,7 @@
clocks = <&sclk>;
};

- smmu0@830000000000 {
+ smmu: smmu0@830000000000 {
compatible = "cavium,smmu-v2";
reg = <0x8300 0x0 0x0 0x2000000>;
#global-interrupts = <1>;
@@ -249,23 +249,18 @@
<0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>,
<0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>,
<0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>, <0 69 4>;
-
- mmu-masters = <&ecam0 0x100>,
- <&pem0 0x200>,
- <&pem1 0x300>,
- <&pem2 0x400>;
-
+ #iommu-cells = <1>;
+ dma-coherent;
};

ecam0: pci@848000000000 {
compatible = "pci-host-ecam-generic";
device_type = "pci";
- msi-parent = <&its>;
msi-map = <0 &its 0 0x10000>;
+ iommu-map = <0 &smmu 0 0x10000>;
bus-range = <0 31>;
#size-cells = <2>;
#address-cells = <3>;
- #stream-id-cells = <1>;
u-boot,dm-pre-reloc;
dma-coherent;
reg = <0x8480 0x00000000 0 0x02000000>; /* Configuration space */
@@ -399,12 +394,11 @@

compatible = "cavium,pci-host-thunder-pem";
device_type = "pci";
- msi-parent = <&its>;
msi-map = <0 &its 0 0x10000>;
+ iommu-map = <0 &smmu 0 0x10000>;
bus-range = <0x1f 0x57>;
#size-cells = <2>;
#address-cells = <3>;
- #stream-id-cells = <1>;
dma-coherent;
reg = <0x8800 0x1f000000 0x0 0x39000000>, /* Configuration space */
<0x87e0 0xc0000000 0x0 0x01000000>; /* PEM space */
@@ -424,12 +418,11 @@
pem1: pci@87e0c1000000 {
compatible = "cavium,pci-host-thunder-pem";
device_type = "pci";
- msi-parent = <&its>;
msi-map = <0 &its 0 0x10000>;
+ iommu-map = <0 &smmu 0 0x10000>;
bus-range = <0x57 0x8f>;
#size-cells = <2>;
#address-cells = <3>;
- #stream-id-cells = <1>;
dma-coherent;
reg = <0x8840 0x57000000 0x0 0x39000000>, /* Configuration space */
<0x87e0 0xc1000000 0x0 0x01000000>; /* PEM space */
@@ -449,12 +442,11 @@
pem2: pci@87e0c2000000 {
compatible = "cavium,pci-host-thunder-pem";
device_type = "pci";
- msi-parent = <&its>;
msi-map = <0 &its 0 0x10000>;
+ iommu-map = <0 &smmu 0 0x10000>;
bus-range = <0x8f 0xc7>;
#size-cells = <2>;
#address-cells = <3>;
- #stream-id-cells = <1>;
dma-coherent;
reg = <0x8880 0x8f000000 0x0 0x39000000>, /* Configuration space */
<0x87e0 0xc2000000 0x0 0x01000000>; /* PEM space */

Robin,

No difference... still need 'arm-smmu.disable_bypass=n' to boot. Are
all four iommu-map props above supposed to be the same? Seems to me
they all point to the same thing which looks wrong.

Hmm... :/

Those mappings just set Stream ID == PCI RID (strictly each one should only need to cover the bus range assigned to that bridge, but it's not crucial) which is the same thing the driver assumes for the mmu-masters property, so either that's wrong and never could have worked anyway - have you tried VFIO on this platform? - or there are other devices also mastering through the SMMU that aren't described at all. Are you able to capture a boot log? The SMMU faults do encode information about the offending ID, and you can typically correlate their appearance reasonably well with endpoint drivers probing.

Robin.