Re: [PATCH v11] drm: Add initial ci/ subdirectory

From: Jani Nikula
Date: Wed Aug 30 2023 - 15:23:55 EST

Next message: Jason Gunthorpe: "Re: [PATCH v7 1/1] vfio/nvgpu: Add vfio pci variant module for grace hopper"
Previous message: Michał Mirosław: "[PATCH v2 7/7] regulator/core: regulator_lock_two: remove duplicate locking code"
In reply to: Maxime Ripard: "Re: [PATCH v11] drm: Add initial ci/ subdirectory"
Next in thread: Maxime Ripard: "Re: [PATCH v11] drm: Add initial ci/ subdirectory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 30 Aug 2023, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> On Tue, Aug 22, 2023 at 04:26:06PM +0200, Daniel Vetter wrote:
>> On Fri, Aug 11, 2023 at 02:19:53PM -0300, Helen Koike wrote:
>> > From: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx>
>> >
>> > Developers can easily execute several tests on different devices
>> > by just pushing their branch to their fork in a repository hosted
>> > on gitlab.freedesktop.org which has an infrastructure to run jobs
>> > in several runners and farms with different devices.
>> >
>> > There are also other automated tools that uprev dependencies,
>> > monitor the infra, and so on that are already used by the Mesa
>> > project, and we can reuse them too.
>> >
>> > Also, store expectations about what the DRM drivers are supposed
>> > to pass in the IGT test suite. By storing the test expectations
>> > along with the code, we can make sure both stay in sync with each
>> > other so we can know when a code change breaks those expectations.
>> >
>> > Also, include a configuration file that points to the out-of-tree
>> > CI scripts.
>> >
>> > This will allow all contributors to drm to reuse the infrastructure
>> > already in gitlab.freedesktop.org to test the driver on several
>> > generations of the hardware.
>> >
>> > Signed-off-by: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx>
>> > Signed-off-by: Helen Koike <helen.koike@xxxxxxxxxxxxx>
>> > Acked-by: Daniel Stone <daniels@xxxxxxxxxxxxx>
>> > Acked-by: Rob Clark <robdclark@xxxxxxxxx>
>> > Tested-by: Rob Clark <robdclark@xxxxxxxxx>
>>
>> Ok I pushed this into a topic/drm-ci branch in drm.git and asked sfr to
>> include that branch in linux-next.
>>
>> But also I'd like to see a lot more acks here, we should be able to at
>> least pile up a bunch of (driver) maintainers from drm-misc in support of
>> this. Also maybe media, at least I've heard noises that they're maybe
>> interested too? Plus anyone else, the more the better.
>
> I'm not really convinced by that approach at all, and most of the issues
> I see are shown by the follow-up series here:

I'm not fully convinced either, more like "let's see". In that narrow
sense, ack. I don't see harm in trying, if you're also open to backing
off in case it does not pan out.

> https://lore.kernel.org/dri-devel/20230825122435.316272-1-vignesh.raman@xxxxxxxxxxxxx/
>
> * We hardcode a CI farm setup into the kernel
>
> * We cannot trust that the code being run is actually the one being
> pushed into gitlab
>
> * IMO, and I know we disagree here, any IGT test we enable for a given
> platform should work, period. Allowing failures and flaky tests just
> sweeps whatever issue is there under the rug. If the test is at
> fault, we should fix the test, if the driver / kernel is at fault,
> then I certainly want to know about it.

At least for display, where this also depends on peripheral hardware,
it's not an easy problem, really. How reliable do you need it to be?
How many nines? Who is going to debug the issues that need hundreds or
thousands of runs to reproduce? If a commit makes some test less
reliable, how long is it going to take to even see that or pinpoint
that?

It's a kind of cop out, but this is not filesystems. In many cases I
think we might be able to make things more robust by failing faster and
failing more, but the users probably want us to plunge forward despite
some errors to try to get something on screen.

(Come to think of it, perhaps we should classify tests based on whether
external hardware plays a role.)

So I'm not so concerned about the filter lists per se, but rather about
having them in kernel.

BR,
Jani.

>
> * This then leads to patches like this one:
> https://lore.kernel.org/dri-devel/20230825122435.316272-6-vignesh.raman@xxxxxxxxxxxxx/
>
> Which (and it's definitely not the author's fault) are just plain
> unreadable, reproducable or auditable by anyone not heavily involved
> in the CI farm operations and the platforms being tested.
>
> That being said, I don't have anything better to suggest than what I
> already did, and it looks like I'm alone in thinking that those are
> problems, so feel free to add my ack if you want to.
>
> Maxime

--
Jani Nikula, Intel Open Source Graphics Center

Next message: Jason Gunthorpe: "Re: [PATCH v7 1/1] vfio/nvgpu: Add vfio pci variant module for grace hopper"
Previous message: Michał Mirosław: "[PATCH v2 7/7] regulator/core: regulator_lock_two: remove duplicate locking code"
In reply to: Maxime Ripard: "Re: [PATCH v11] drm: Add initial ci/ subdirectory"
Next in thread: Maxime Ripard: "Re: [PATCH v11] drm: Add initial ci/ subdirectory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]