Re: [RFC] [kbuild test robot] random-order parallel building

From: Masahiro Yamada
Date: Fri Jun 09 2023 - 12:00:23 EST


On Fri, Jun 9, 2023 at 5:41 PM Liu, Yujie <yujie.liu@xxxxxxxxx> wrote:
>
> Hi Masahiro,
>
> On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote:
> > On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
> > > Hello, maintainers of the kbuild test robot.
> > >
> > > I have a proposal for the 0day tests.
> >
> > Thanks a lot for the proposal for the shuffle make, we will do some
> > investigation to try this random order parallel build. The gnu make
> > we currently use is 4.3, we will try the 4.4 to see any problem.
> >
> > For the timeline, we may provide update later this month.
>
> We've upgraded to make v4.4.1 in kernel test robot and enabled random-
> order parallel compiling in our randconfig build tests. The shuffle
> seed is generated by hashing the randconfig, so it changes overtime and
> can cover various random orders. We are still doing some internal
> testing and will put it online once everything is done.
>
> > >
> > >
> > > GNU Make traditionally processes the dependency from left to right.
> > >
> > > For example, if you have dependency like this:
> > >
> > > all: foo bar baz
> > >
> > > GNU Make builds foo, bar, baz, in this order.
> > >
> > >
> > > Some projects that are not capable of parallel builds
> > > rely on that behavior implicitly.
> > >
> > > Kbuild, however, is intended to work well in parallel.
> > > (As the maintainer, I really care about it.)
> > >
> > >
> > > From time to time, people add "just worked for me" code,
> > > but apparently that lacks proper dependency.
> > > Sometimes it requires an expensive CPU to reproduce
> > > parallel build issues.
> > >
> > >
> > > For example, see this report,
> > > https://lkml.org/lkml/2016/11/30/587
> > >
> > > The report says 'make -j112' reproduces the broken parallel build.
> > > Most people do not have such a build machine that comes with 112
> > > cores.
> > > It is difficult to reproduce it (or even notice it).
> > >
> > > (Some time later, it was root-caused by 07a422bb213a)
>
> Thanks a lot for sharing this case. We tried to reproduce it, but looks
> it dates back to v4.9-rc7 and throws some other errors when compiling
> in our kbuild env, so we are not able to reproduce it yet. Not sure if
> it is related with toolchain/compiler version or the kernel config.
>
> This case mentioned that 'make -j112' can reproduce the breakage. We
> assume this is under traditional serial order build. Does it imply that
> it is likely to take much less parallel jobs to reproduce the breakage
> when shuffle is set, say 'make --shuffle=SEED -j32', so developers are
> able to reproduce it on an ordinary CPU with less cores?


I think --shuffle will help a build machine with fewer cores
catch issues, but it is not a full randomization.

In my understanding, --shuffle still traverses depth-first.


Consider this example.


all: foo bar

foo: foo-sub

bar: bar-sub


Only either [1] or [2] happens.

[1] foo-sub -> foo -> bar-sub -> bar -> all
[2] bar-sub -> bar -> foo-sub -> foo -> all



foo-sub -> bar-sub -> bar -> foo -> all

is a possible order, but --shuffle never schedules like that.






> Not sure if there are other known cases of parallel build breakage
> (especially in recent kernels). If any, it would be very kind if you
> could also share them. We can first try reproducing them in the bot to
> confirm our test flow works well.

I do not remember any other real breakage.

>
> Another question is about bisection. Say the bot catches a breakage on
> commit1 which root-caused to a previous commit2. If we keep the options
> "--shuffle=<seed> -j<jobs>" consistent during the whole process of
> bisection, will the breakage 100% show up on all the commits between
> commit2 and commit1, or it is kind of possible to reproduce the
> breakage, but not 100% reproducible on every commit during bisection?


I am not sure, but I _guess_ git-bisect may not point to commit 2
if there is a Makefile change in between.



commit2 (root cause)
-> commitA (add Makefile change)
-> commit1 (0 day bot noticed an issue here)


Even if the same --shuffle=SEED is given, the issue may not be
reproducible on commit2..commitA if commitA changes a Makefile.


Thanks for considering this.




> Thanks a lot for this parallel building proposal, and we will keep
> updating the status.
>
> --
> Best Regards,
> Yujie Liu
>
> > >
> > >
> > > GNU Make 4.4 got this option.
> > >
> > > --shuffle[={SEED|random|reverse|none}]
> > > Perform shuffle of prerequisites and goals.
> > >
> > >
> > >
> > > 'make --shuffle=reverse' will build in reverse order.
> > > In the example above, baz, bar, foo.
> > >
> > > 'make --shuffle' will randomize the build order.
> > >
> > >
> > > If there exists a missing dependency among foo, bar, baz,
> > > it will fail to build.
> > >
> > >
> > >
> > > We already perform the randconfig daily basis.
> > > So, random-order parallel building is a similar idea.
> > >
> > > Perhaps, it makes sense to add the "--shuffle=SEED" option
> > > but it requires GNU Make 4.4. (or GNU Make 4.4.1)
> > > Is this too new?
> >
> > Our production environment is 4.3 right now. It will take extra
> > time for us to upgrade the environment but it's doable for us.
> >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Masahiro Yamada
> >
>


--
Best Regards
Masahiro Yamada