Re: [PATCH] arm64/io: Don't use WZR in writel

From: AngeloGioacchino Del Regno
Date: Mon Feb 11 2019 - 11:02:58 EST


Il giorno lun, 11/02/2019 alle 11.52 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 10:57, Will Deacon wrote:
> > On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del
> > Regno wrote:
> > > From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00
> > > 2001
> > > From: "Angelo G. Del Regno" <kholk11@xxxxxxxxx>
> > > Date: Sat, 9 Feb 2019 18:56:46 +0100
> > > Subject: [PATCH] arm64/io: Don't use WZR in writel
> > >
> > > This is a partial revert of commit ee5e41b5f21a
> > > ("arm64/io: Allow I/O writes to use {W,X}ZR")
> > >
> > > When we try to use the zero register directly on some SoCs,
> > > their security will make them freeze due to a firmware bug.>>
> > > This behavior is seen with the arm-smmu driver freezing on
> > > TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.
>
> This looks similar to the issue these SoCs have with GICv3, worked
> around in 9c8114c20d18.
>

Well, yes that's a firmware quirk, of course, due to the "security"
stuff that they have inside...

> > Hmm, this sounds very fragile. I hope they're not trapping and
> > emulating
> > MMIO accesses and treating the zero register as the stack
> > pointer...
>
> I bet this is the case. The same bug was there in both KVM and Xen.
> The
> only difference is that we fixed it back in December 2015 (at least
> for
> KVM), while some of these SoCs were announced in 2017, and are still
> shipping. Great stuff.

Totally agree, they must be using it as stack pointer.
Poor decision.

>
> > Wouldn't this also be triggerable from userspace by mmap()ing
> > either
> > /dev/mem or e.g. a PCI bar via sysfs?
> >
> > > Allocating a temporary register to store the zero for the
> > > write actually solves the issue on these SoCs.
> >
> > I don't think this catches all MMIO accesses, so I think we need to
> > understand more about the actual issue here. For example, is it
> > only the
> > SMMU that causes this problem? Also, any workaround should be
> > specific to
> > the broken SoCs.
>
> Also, nothing would prevent a compiler from generating these
> accesses.
>
> M.
>
> Jazz is not dead. It just smells funny...

While I agree that nothing would prevent a compiler from generating
these accesses, please take in mind that everything worked on
downstream kernels before this change was introduced (which is first
seen downstream on msm-4.9).
So I've discovered it on msm-4.9 while porting the 8996-98, 630-660
to that and I've had a whole lot of head scratching: the arm-smmu
code was apparently right, then I've seen that surprise......
By the way, I can tell you for sure that this bug is not present on
at least SDM845, since that one worked fine even before this fix,
and I imagine that also SDM670 and newest may not be affected.
Also Family-B SoCs are not affected by this bug (MSM8916-36-37-56-76).

Unfortunately, I couldn't think of any other solution on these
Family-A SoCs, also because I'm not totally sure that the only
driver that produces this issue is arm-smmu. When I've fixed it
on the downstream kernel, I've also had some other random freezes
that weren't related to the SMMU: usually qseecom stuff was also
acting funny sometimes.

Also, just one more thing: yes this thing is going ARM64-wide and
- from my findings - it's targeting certain Qualcomm SoCs, but...
I'm not sure that only QC is affected by that, others may as well
have the same stupid bug.