Re: mainline build failure due to cf21f328fcaf ("media: nxp: Add i.MX8 ISI driver")

From: Hans Verkuil
Date: Wed May 10 2023 - 09:17:11 EST


On 10/05/2023 10:05, Mauro Carvalho Chehab wrote:
> Hi Linus,
>
> Em Mon, 8 May 2023 09:27:28 -0700
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> escreveu:
>
>> On Mon, May 8, 2023 at 3:55 AM Linux regression tracking #adding
>> (Thorsten Leemhuis) <regressions@xxxxxxxxxxxxx> wrote:
>>>
>>> Thanks for the report. The fixes (see the mail from Laurent) apparently
>>> are still not mainlined (or am I missing something?), so let me add this
>>> report to the tracking to ensure this is not forgotten:
>>
>> Gaah. I was intending to apply the patch directly before rc1, but then
>> I forgot about this issue.
>>
>> Mauro: I'm currently really *really* fed up with the media tree. This
>> exact same thing happened last merge window, where the media tree
>> caused pointless build errors, and it took way too long to get the
>> fixes the proper ways.
>>
>> If something doesn't even build, it should damn well be fixed ASAP.
>>
>> Last release it was imx290.c and PM support being disabled, and I had
>> to apply the fix manually because it continued to not come in the
>> proper way.
>>
>> See commit 7b50567bdcad ("media: i2c: imx290: fix conditional function
>> defintions").
>>
>> But also see commit b928db940448 ("media: i2c: imx290: fix conditional
>> function definitions"), which you *did* commit, but note this on that
>> commit:
>>
>> AuthorDate: Tue Feb 7 17:13
>> CommitDate: Sat Mar 18 08:44
>>
>> so it took you a MONTH AND A HALF to react to a build failure.
>>
>> And see this:
>>
>> git name-rev b928db940448
>> b928db940448 tags/v6.4-rc1~161^2~458
>>
>> ie that build fix that you finally committed came in *AFTER* the 6.3
>> release, even though the bug it fixes was introduced in the 6.3 merge
>> window:
>>
>> git name-rev 02852c01f654
>> 02852c01f654 tags/v6.3-rc1~72^2~2^2~193
>>
>> and now we're in the *EXACT*SAME* situation, with me applying a build
>> fix directly, because you couldn't get it fixed in a timely manner.
>
> Sorry for the mess. I'll work to improve the process to avoid this
> to happen again.
>
> FYI, in order to reduce build issues, we have a Jenkins instance
> doing builds with gcc and CLANG at the media stage tree, before we even merge
> them at the main media development tree. They run with allyesconfig for
> x86_64 arch, with W=1:
>
> https://builder.linuxtv.org/job/media_stage_clang/
> https://builder.linuxtv.org/job/media_stage_gcc/
>
> And another CI job testing bisect breakages as I receive pull requests,
> applying patch per patch and using both allyesconfig and allmodconfig,
> also on x86_64 arch with W=1:
>
> https://builder.linuxtv.org/job/patchwork/
>
> The rule is to not merge stuff on media tree if any of those jobs
> fail. I also fast-forward merging patches whose subject states that
> the build has failed.
>
> In order to help with that, on normal situation, I usually take one week
> to merge stuff from media_stage into media_tree, doing rebases at
> media_stage if needed to avoid git bisect build breakages at media_tree
> (which is from where I send my update PRs to you).
>
> Unfortunately, currently we don't have resources to do multiple randconfig
> on Jenkins, as the build machines on the server are very slow. Yet, I'll
> add CONFIG_PM disabled to the test set, as it seems to be a recurrent source
> of troubles those days. I'll also try to identify a couple of other
> randconfigs that would help to catch earlier problems like that.
> If some other problematic Kconfig variables comes to your mind, please
> feel free to suggest them for us to add to the CI automation.
>
> -
>
> In the specific case of this fixup patch, I didn't identify it as a build
> issue, so it followed the usual workflow. We have a huge number of patches
> for media, and it usually takes some time to handle all of them. This one
> just followed the normal flow, as it didn't break Jenkins builds nor the
> subject mentioned anything about build breakage.

In the end it was my fault: I pushed the fix to our staging tree thinking
that there was enough time for it to be included in the PR for 6.4.
But I was wrong, the window for that closed a week earlier (which Mauro
even documented!). So Mauro never knew that this patch had to be included
in the PR to you. The right procedure would have been for me to tell Mauro
about this patch. Hopefully this will be the first and also last time that
I make that mistake.

We do have a major problem with too many incoming patches and not enough
maintainers & time. Some of it can be improved with better procedures and
testing, but that won't help the often slow code review times. It will be a
big topic during the upcoming media mini summit in Prague.

Regards,

Hans