Re: next/master bisection: baseline.login on rk3288-rock2-square

From: Ard Biesheuvel
Date: Thu Feb 04 2021 - 04:12:31 EST


On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
<guillaume.tucker@xxxxxxxxxxxxx> wrote:
>
> Hi Ard,
>
> Please see the bisection report below about a boot failure on
> rk3288 with next-20210203. It was also bisected on
> imx6q-var-dt6customboard with next-20210202.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> The kernel is most likely crashing very early on, so there's
> nothing in the logs. Please let us know if you need some help
> with debugging or trying a fix on these platforms.
>

Thanks for the report.

Mind trying the following fix?

--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -617,8 +617,10 @@ not_relocated: mov r0, #0
@ cache_clean_flush() redundant. In other cases, the clean is
@ performed by set/way and R0/R1 are ignored.
@
- mov r0, #0
- mov r1, #0
+ get_inflated_image_size r1, r2, r3
+
+ mov r0, r4 @ start of decompressed kernel
+ add r1, r1, r0 @ end of kernel BSS
bl cache_clean_flush

get_inflated_image_size r1, r2, r3




> On 04/02/2021 04:25, KernelCI bot wrote:
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > * This automated bisection report was sent to you on the basis *
> > * that you may be involved with the breaking commit it has *
> > * found. No manual investigation has been done to verify it, *
> > * and the root cause of the problem may be somewhere else. *
> > * *
> > * If you do send a fix, please include this trailer: *
> > * Reported-by: "kernelci.org bot" <bot@xxxxxxxxxxxx> *
> > * *
> > * Hope this helps! *
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > next/master bisection: baseline.login on rk3288-rock2-square
> >
> > Summary:
> > Start: 58b6c0e507b7 Add linux-next specific files for 20210203
> > Plain log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
> > HTML log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
> > Result: 5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> > Checks:
> > revert: PASS
> > verify: PASS
> >
> > Parameters:
> > Tree: next
> > URL: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > Branch: master
> > Target: rk3288-rock2-square
> > CPU arch: arm
> > Lab: lab-collabora
> > Compiler: clang-11
> > Config: multi_v7_defconfig
> > Test case: baseline.login
> >
> > Breaking commit found:
> >
> > -------------------------------------------------------------------------------
> > commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > Author: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Date: Sun Jan 24 18:03:45 2021 +0100
> >
> > ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> > Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
> > for v7 cores") replaced the by-set/way cache maintenance in the decompressor
> > with by-VA cache maintenance, which is more appropriate for the task at
> > hand, especially under virtualization on hosts with non-architected system
> > caches that are not affected by by-set/way maintenance at all.
> >
> > On such systems, that commit inadvertently removed the cache clean and
> > invalidate of all of the guest's memory that is performed by KVM on behalf
> > of the guest after its MMU is disabled (but only if any by-set/way cache
> > maintenance instructions were issued first). This resulted in various
> > erroneous behaviors observed by Russell, all involving the mini-stack
> > used by the core kernel's v7 boot code, and which resides in BSS. It
> > seems intractable to figure out exactly what goes wrong in each of these
> > cases, but some small experiments did suggest that the lack of a cache
> > clean and invalidate *after* disabling the MMU and caches is what
> > triggers the errors, presumably because cachelines are being allocated
> > or reallocated while the first cache clean and invalidate is in progress.
> >
> > To ensure that no cache lines cover any of the data that is accessed by
> > the booting kernel with the MMU off, include the uncompressed kernel's
> > BSS region in the cache clean operation.
> >
> > Also, to ensure that no cachelines are allocated while the cache is being
> > cleaned, perform the cache clean operation *after* disabling the MMU and
> > caches when running on v7 or later, by making a tail call to the clean
> > routine from the cache_off routine. This requires passing the VA range
> > to cache_off(), which means some care needs to be taken to preserve
> > R0 and R1 across the call to cache_off().
> >
> > Since this makes the first cache clean redundant, call it with the
> > range reduced to zero. This only affects v7, as all other versions
> > ignore R0/R1 entirely.
> >
> > Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@xxxxxxxxxx
> >
> > Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores")
> > Reported-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index d9cce7238a36..5f231b6f0d1a 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -609,11 +609,24 @@ not_relocated: mov r0, #0
> > mov r3, r7
> > bl decompress_kernel
> >
> > + @
> > + @ Perform a cache clean before disabling the MMU entirely.
> > + @ In cases where the MMU needs to be disabled first (v7+),
> > + @ the clean is performed again by cache_off(), using by-VA
> > + @ operations on the range [R0, R1], making this prior call to
> > + @ cache_clean_flush() redundant. In other cases, the clean is
> > + @ performed by set/way and R0/R1 are ignored.
> > + @
> > + mov r0, #0
> > + mov r1, #0
> > + bl cache_clean_flush
> > +
> > get_inflated_image_size r1, r2, r3
> > + ldr r2, =_kernel_bss_size
> > + add r1, r1, r2
> >
> > - mov r0, r4 @ start of inflated image
> > - add r1, r1, r0 @ end of inflated image
> > - bl cache_clean_flush
> > + mov r0, r4 @ start of decompressed kernel
> > + add r1, r1, r0 @ end of kernel BSS
> > bl cache_off
> >
> > #ifdef CONFIG_ARM_VIRT_EXT
> > @@ -1124,12 +1137,14 @@ proc_types:
> > * reading the control register, but ARMv4 does.
> > *
> > * On exit,
> > - * r0, r1, r2, r3, r9, r12 corrupted
> > + * r0, r1, r2, r3, r9, r10, r11, r12 corrupted
> > * This routine must preserve:
> > * r4, r7, r8
> > */
> > .align 5
> > cache_off: mov r3, #12 @ cache_off function
> > + mov r10, r0
> > + mov r11, r1
> > b call_cache_fn
> >
> > __armv4_mpu_cache_off:
> > @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off:
> > mcr p15, 0, r0, c7, c5, 6 @ invalidate BTC
> > mcr p15, 0, r0, c7, c10, 4 @ DSB
> > mcr p15, 0, r0, c7, c5, 4 @ ISB
> > - mov pc, lr
> > +
> > + mov r0, r10
> > + b __armv7_mmu_cache_flush
> >
> > /*
> > * Clean and flush the cache to maintain consistency.
> > -------------------------------------------------------------------------------
> >
> >
> > Git bisection log:
> >
> > -------------------------------------------------------------------------------
> > git bisect start
> > # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
> > git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901
> > # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203
> > git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6
> > # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master'
> > git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2
> > # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next'
> > git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a
> > # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus'
> > git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795
> > # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout
> > git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399
> > # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120
> > # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
> > git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a
> > # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598
> > # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> > git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362
> > # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> > git bisect bad d3aa3465622d6d96645611b331312b773806d1a7
> > # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3
> > git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823
> > # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents
> > git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64
> > # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
> > git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6
> > # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage
> > git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956
> > # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > -------------------------------------------------------------------------------
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> > View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431
> > Mute This Topic: https://groups.io/mt/80373377/924702
> > Group Owner: kernelci-results+owner@xxxxxxxxx
> > Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@xxxxxxxxxxxxx]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >
>