Re: bisected - arm64 kvm unit test failures

From: Marc Zyngier
Date: Wed Aug 01 2018 - 01:36:45 EST


On Tue, 31 Jul 2018 19:28:49 +0100,
Mike Galbraith <efault@xxxxxx> wrote:

Hi Mike,

>
> [1 <text/plain; ISO-8859-15 (7bit)>]
> On Mon, 2018-07-30 at 18:24 +0200, Mike Galbraith wrote:
> > On Sun, 2018-07-29 at 13:47 +0200, Mike Galbraith wrote:
> > > FYI, per kvm unit tests, 4.16-rt definitely has more kvm issues.
>
> But it's not RT, or rather most of it isn't...
>
> > > huawei5:/abuild/mike/kvm-unit-tests # uname -r
> > > 4.16.18-rt11-rt
> > > huawei5:/abuild/mike/kvm-unit-tests # ./run_tests.sh
> > > PASS selftest-setup (2 tests)
> > > FAIL selftest-vectors-kernel
> > > FAIL selftest-vectors-user
> > > PASS selftest-smp (65 tests)
> > > PASS pci-test (1 tests)
> > > PASS pmu (3 tests)
> > > FAIL gicv2-ipi
> > > FAIL gicv3-ipi
> > > FAIL gicv2-active
> > > FAIL gicv3-active
> > > PASS psci (4 tests)
> > > FAIL timer
> > > huawei5:/abuild/mike/kvm-unit-tests #
> > >
> > > 4.14-rt passes all tests. The above is with the kvm raw_spinlock_t
> > > conversion patch applied, but the 4.12 based SLERT tree I cloned to
> > > explore arm-land in the first place shows only one timer failure, and
> > > has/needs it applied as well, which would seem to vindicate it.
> > >
> > > huawei5:/abuild/mike/kvm-unit-tests # uname -r
> > > 4.12.14-0.gec0b559-rt
> > > huawei5:/abuild/mike/kvm-unit-tests # ./run_tests.sh
> > > PASS selftest-setup (2 tests)
> > > PASS selftest-vectors-kernel (2 tests)
> > > PASS selftest-vectors-user (2 tests)
> > > PASS selftest-smp (65 tests)
> > > PASS pci-test (1 tests)
> > > PASS pmu (3 tests)
> > > PASS gicv2-ipi (3 tests)
> > > PASS gicv3-ipi (3 tests)
> > > PASS gicv2-active (1 tests)
> > > PASS gicv3-active (1 tests)
> > > PASS psci (4 tests)
> > > FAIL timer (8 tests, 1 unexpected failures)
> >
> > FWIW, this single timer failure wass inspired by something in the 4-15
> > merge window.
>
> As noted, the single timer failure is an RT issue of some sort, and
> remains. The rest I bisected in @stable with the attached config, and
> confirmed that revert fixes up 4.16-rt as well (modulo singleton).

Is it something that is reproducible with the current mainline (non-RT)?

>
> a9c0e12ebee56ef06b7eccdbc73bab71d0018df8 is the first bad commit
> commit a9c0e12ebee56ef06b7eccdbc73bab71d0018df8
> Author: Marc Zyngier <marc.zyngier@xxxxxxx>
> Date: Mon Oct 23 17:11:20 2017 +0100
>
> KVM: arm/arm64: Only clean the dcache on translation fault
>
> The only case where we actually need to perform a dcache maintenance
> is when we map the page for the first time, and subsequent permission
> faults do not require cache maintenance. Let's make it conditional
> on not being a permission fault (and thus a translation fault).
>
> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>

Pretty worrying. What HW is that on?

M.

--
Jazz is not dead, it just smell funny.