Re: [PATCH v4] Makefile.compiler: replace cc-ifversion with compiler-specific macros

From: Masahiro Yamada
Date: Tue Jun 20 2023 - 00:19:48 EST


On Mon, Jun 12, 2023 at 7:10 PM Shreeya Patel
<shreeya.patel@xxxxxxxxxxxxx> wrote:
>
> Hi Masahiro,
>
>
> On 24/05/23 02:57, Nick Desaulniers wrote:
> > On Tue, May 23, 2023 at 3:27 AM Shreeya Patel
> > <shreeya.patel@xxxxxxxxxxxxx> wrote:
> >> Hi Nick and Masahiro,
> >>
> >> On 23/05/23 01:22, Nick Desaulniers wrote:
> >>> On Mon, May 22, 2023 at 9:52 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>> On Mon, May 22, 2023 at 12:09:34PM +0200, Ricardo Cañuelo wrote:
> >>>>> On vie, may 19 2023 at 08:57:24, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> >>>>>> It could be; if the link order was changed, it's possible that this
> >>>>>> target may be hitting something along the lines of:
> >>>>>> https://isocpp.org/wiki/faq/ctors#static-init-order i.e. the "static
> >>>>>> initialization order fiasco"
> >>>>>>
> >>>>>> I'm struggling to think of how this appears in C codebases, but I
> >>>>>> swear years ago I had a discussion with GKH (maybe?) about this. I
> >>>>>> think I was playing with converting Kbuild to use Ninja rather than
> >>>>>> Make; the resulting kernel image wouldn't boot because I had modified
> >>>>>> the order the object files were linked in. If you were to randomly
> >>>>>> shuffle the object files in the kernel, I recall some hazard that may
> >>>>>> prevent boot.
> >>>>> I thought that was specifically a C++ problem? But then again, the
> >>>>> kernel docs explicitly say that the ordering of obj-y goals in kbuild is
> >>>>> significant in some instances [1]:
> >>>> Yes, it matters, you can not change it. If you do, systems will break.
> >>>> It is the only way we have of properly ordering our init calls within
> >>>> the same "level".
> >>> Ah, right it was the initcall ordering. Thanks for the reminder.
> >>>
> >>> (There's a joke in there similar to the use of regexes to solve a
> >>> problem resulting in two new problems; initcalls have levels for
> >>> ordering, but we still have (unexpressed) dependencies between calls
> >>> of the same level; brittle!).
> >>>
> >>> +Maksim, since that might be relevant info for the BOLT+Kernel work.
> >>>
> >>> Ricardo,
> >>> https://elinux.org/images/e/e8/2020_ELCE_initcalls_myjosserand.pdf
> >>> mentions that there's a kernel command line param `initcall_debug`.
> >>> Perhaps that can be used to see if
> >>> 5750121ae7382ebac8d47ce6d68012d6cd1d7926 somehow changed initcall
> >>> ordering, resulting in a config that cannot boot?
> >>
> >> Here are the links to Lava jobs ran with initcall_debug added to the
> >> kernel command line.
> >>
> >> 1. Where regression happens (5750121ae7382ebac8d47ce6d68012d6cd1d7926)
> >> https://lava.collabora.dev/scheduler/job/10417706
> >> <https://lava.collabora.dev/scheduler/job/10417706>
> >>
> >> 2. With a revert of the commit 5750121ae7382ebac8d47ce6d68012d6cd1d7926
> >> https://lava.collabora.dev/scheduler/job/10418012
> >> <https://lava.collabora.dev/scheduler/job/10418012>
> > Thanks!
> >
> > Yeah, I can see a diff in the initcall ordering as a result of
> > commit 5750121ae738 ("kbuild: list sub-directories in ./Kbuild")
> >
> > https://gist.github.com/nickdesaulniers/c09db256e42ad06b90842a4bb85cc0f4
> >
> > Not just different orderings, but some initcalls seem unique to the
> > before vs. after, which is troubling. (example init_events and
> > init_fs_sysctls respectively)
> >
> > That isn't conclusive evidence that changes to initcall ordering are
> > to blame, but I suspect confirming that precisely to be very very time
> > consuming.
> >
> > Masahiro, what are your thoughts on reverting 5750121ae738? There are
> > conflicts in Kbuild and Makefile when reverting 5750121ae738 on
> > mainline.
>
> I'm not sure if you followed the conversation but we are still seeing
> this regression with the latest kernel builds and would like to know if
> you plan to revert 5750121ae738?


Reverting 5750121ae738 does not solve the issue
because the issue happens even before 5750121ae738.
multi_v7_defconfig + debug.config + CONFIG_MODULES=n
fails to boot in the same way.

The revert would hide the issue on a particular build setup.


I submitted a patch to more pin-point the issue.
Let's see how it goes.
https://lore.kernel.org/lkml/ZJEni98knMMkU%2Fcl@xxxxxxxxxxxxxxxxxx/T/#t


(BTW, the initcall order is unrelated)





>
>
> Thanks,
> Shreeya Patel
>
> >>
> >> Thanks,
> >> Shreeya Patel
> >>
> >

--
Best Regards
Masahiro Yamada