Re: next/master bisection: baseline.login on rk3288-rock2-square

From: Ard Biesheuvel
Date: Thu Feb 04 2021 - 11:03:46 EST


On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
<guillaume.tucker@xxxxxxxxxxxxx> wrote:
>
> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > <guillaume.tucker@xxxxxxxxxxxxx> wrote:
> >>
> >> On 04/02/2021 10:33, Guillaume Tucker wrote:
> >>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
> >>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> >>>> <linux@xxxxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> >>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> >>>>>> <guillaume.tucker@xxxxxxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> Hi Ard,
> >>>>>>>
> >>>>>>> Please see the bisection report below about a boot failure on
> >>>>>>> rk3288 with next-20210203. It was also bisected on
> >>>>>>> imx6q-var-dt6customboard with next-20210202.
> >>>>>>>
> >>>>>>> Reports aren't automatically sent to the public while we're
> >>>>>>> trialing new bisection features on kernelci.org but this one
> >>>>>>> looks valid.
> >>>>>>>
> >>>>>>> The kernel is most likely crashing very early on, so there's
> >>>>>>> nothing in the logs. Please let us know if you need some help
> >>>>>>> with debugging or trying a fix on these platforms.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for the report.
> >>>>>
> >>>>> Ard,
> >>>>>
> >>>>> I want to send my fixes branch today which includes your regression
> >>>>> fix that caused this regression.
> >>>>>
> >>>>> As this is proving difficult to fix, I can only drop your fix from
> >>>>> my fixes branch - and given that this seems to be problematical, I'm
> >>>>> tempted to revert the original change at this point which should fix
> >>>>> both of these regressions - and then we have another go at getting rid
> >>>>> of the set/way instructions during the next cycle.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>
> >>>> Hi Russell,
> >>>>
> >>>> If Guillaume is willing to do the experiment, and it fixes the issue,
> >>>
> >>> Yes, I'm running some tests with that fix now and should have
> >>> some results shortly.
> >>
> >> Yes it does fix the issue:
> >>
> >> https://lava.collabora.co.uk/scheduler/job/3173819
> >>
> >> with Ard's fix applied to this test branch:
> >>
> >> https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> >>
> >>
> >> +clang +Nick
> >>
> >> It's worth mentioning that the issue only happens with kernels
> >> built with Clang. As you can see there are several other arm
> >> platforms failing with clang-11 builds but booting fine with
> >> gcc-8:
> >>
> >> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
> >>
> >> Here's a sample build log:
> >>
> >> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
> >>
> >> Essentially:
> >>
> >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>
> >> I believe it should be using the GNU assembler as LLVM_IAS=1 is
> >> not defined, but there may be something more subtle about it.
> >>
> >
> >
> > Do you have a link for a failing zImage built from multi_v7_defconfig?
>
> Sure, this one was built from a plain next-20210203:
>
> http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>
> You can also find the dtbs, modules and other things in that same
> directory.
>
> For the record, here's the test job that used it:
>
> https://lava.collabora.co.uk/scheduler/job/3173792
>

Thanks.

That zImage boots fine locally. Unfortunately, I don't have rk3288
hardware to reproduce.

Could you please point me to the list of all the other platforms that
failed to boot this image?

To be honest, I am slightly annoyed that a change that works fine with
GCC but does not work with Clang version

11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158

(where exp means experimental, I suppose) is the reason for this
discussion, especially because the change is in asm code. Is it
possible to build with Clang but use the GNU linker?