Re: [RFC] [kbuild test robot] random-order parallel building

From: Liu, Yujie
Date: Fri Jun 09 2023 - 04:41:29 EST


Hi Masahiro,

On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote:
> On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
> > Hello, maintainers of the kbuild test robot.
> >
> > I have a proposal for the 0day tests.
>
> Thanks a lot for the proposal for the shuffle make, we will do some
> investigation to try this random order parallel build. The gnu make
> we currently use is 4.3, we will try the 4.4 to see any problem.
>
> For the timeline, we may provide update later this month.

We've upgraded to make v4.4.1 in kernel test robot and enabled random-
order parallel compiling in our randconfig build tests. The shuffle
seed is generated by hashing the randconfig, so it changes overtime and
can cover various random orders. We are still doing some internal
testing and will put it online once everything is done.

> >
> >
> > GNU Make traditionally processes the dependency from left to right.
> >
> > For example, if you have dependency like this:
> >
> >      all: foo bar baz
> >
> > GNU Make builds foo, bar, baz, in this order.
> >
> >
> > Some projects that are not capable of parallel builds
> > rely on that behavior implicitly.
> >
> > Kbuild, however, is intended to work well in parallel.
> > (As the maintainer, I really care about it.)
> >
> >
> > From time to time, people add "just worked for me" code,
> > but apparently that lacks proper dependency.
> > Sometimes it requires an expensive CPU to reproduce
> > parallel build issues.
> >
> >
> > For example, see this report,
> >   https://lkml.org/lkml/2016/11/30/587
> >
> > The report says 'make -j112' reproduces the broken parallel build.
> > Most people do not have such a build machine that comes with 112
> > cores.
> > It is difficult to reproduce it (or even notice it).
> >
> > (Some time later, it was root-caused by 07a422bb213a)

Thanks a lot for sharing this case. We tried to reproduce it, but looks
it dates back to v4.9-rc7 and throws some other errors when compiling
in our kbuild env, so we are not able to reproduce it yet. Not sure if
it is related with toolchain/compiler version or the kernel config.

This case mentioned that 'make -j112' can reproduce the breakage. We
assume this is under traditional serial order build. Does it imply that
it is likely to take much less parallel jobs to reproduce the breakage
when shuffle is set, say 'make --shuffle=SEED -j32', so developers are
able to reproduce it on an ordinary CPU with less cores?

Not sure if there are other known cases of parallel build breakage
(especially in recent kernels). If any, it would be very kind if you
could also share them. We can first try reproducing them in the bot to
confirm our test flow works well.

Another question is about bisection. Say the bot catches a breakage on
commit1 which root-caused to a previous commit2. If we keep the options
"--shuffle=<seed> -j<jobs>" consistent during the whole process of
bisection, will the breakage 100% show up on all the commits between
commit2 and commit1, or it is kind of possible to reproduce the
breakage, but not 100% reproducible on every commit during bisection?

Thanks a lot for this parallel building proposal, and we will keep
updating the status.

--
Best Regards,
Yujie Liu

> >
> >
> > GNU Make 4.4 got this option.
> >
> >   --shuffle[={SEED|random|reverse|none}]
> >        Perform shuffle of prerequisites and goals.
> >
> >
> >
> > 'make --shuffle=reverse' will build in reverse order.
> > In the example above, baz, bar, foo.
> >
> > 'make --shuffle' will randomize the build order.
> >
> >
> > If there exists a missing dependency among foo, bar, baz,
> > it will fail to build.
> >
> >
> >
> > We already perform the randconfig daily basis.
> > So, random-order parallel building is a similar idea.
> >
> > Perhaps, it makes sense to add the "--shuffle=SEED" option
> > but it requires GNU Make 4.4.  (or GNU Make 4.4.1)
> > Is this too new?
>
> Our production environment is 4.3 right now. It will take extra
> time for us to upgrade the environment but it's doable for us.
>
> >
> >
> >
> > --
> > Best Regards
> > Masahiro Yamada
>