Re: Kernel regression tracking/reporting initiatives and KCIDB

From: Guillaume Tucker
Date: Thu Aug 17 2023 - 09:33:46 EST


Hi Ricardo,

On 01/08/2023 13:47, Ricardo Cañuelo wrote:
> Hi all,
>
> I'm Ricardo from Collabora. In the past months, we’ve been analyzing the
> current status of CI regression reporting and tracking in the Linux
> kernel: assessing the existing tools, testing their functionalities,
> collecting ideas about desirable features that aren’t available yet and
> sketching some of them.
>
> As part of this effort, we wrote a Regression Tracker tool [1] as a
> proof of concept. It’s a rather simple tool that takes existing
> regression data and reports and uses them to show more context on each
> reported regression, as well as highlighting the relationships between
> them, whether they can be caused by an infrastructure error and other
> additional metadata about their current status. We’ve been using it
> mostly as a playground for us to explore the current status of the
> functionalities provided by CI systems and to test ideas about new
> features.
>
> We’re also checking other tools and services provided by the community,
> such as regzbot [2], collaborating with them when possible and thinking
> about how to combine multiple scattered efforts by different people
> towards the same common goal. As a first step, we’ve contributed to
> regzbot and partially integrated its results into the Regression Tracker
> tool.
>
> So far, we’ve been using the KernelCI regression data and reports as a
> data source, we're now wondering if we could tackle the problem with a
> more general approach by building on top of what KCIDB already provides.

As the new KernelCI API is ramping up, we're now starting a
discussion about how to address the issue of having 2 APIs in
KernelCI. There are several ways to solve this, but essentially
I think we agree we would like to have one main database and one
new web dashboard showing this data. With the new API, data is
owned by the users who submit it so we can effectively provide a
solution for grouping data from multiple CI systems like KCIDB
does.

The key thing here is that KernelCI as a project will be
providing a database with regression information collected from
any public CI system. So the topic of tracking regressions for
the whole kernel is already part of the roadmap for KernelCI, and
if just waiting for CI systems to push data is not enough we can
have services that actively go and look for regressions to feed
them into the database under a particular category (or user).

It would be good to align ideas you may have with KernelCI's
plans, also please take into account the fact that the current
Regression tracker you've created relies on the legacy system
which is going to be retired in the coming months.

> In general, CI systems tend to define regressions as a low-level concept
> which is rather static: a snapshot of a test result at a certain point
> in time. When it comes to reporting them to developers, there's much
> more info that could be added to them. In particular, the context of it
> and the fact that a reported regression has a life cycle:
>
> - did this test also fail on other hardware targets or with other kernel
> configurations?
> - is it possible that the test failed because of an infrastructure
> error?

This should be treated as a false-positive failing test rather
than a "regression". But yes of course we need to deal with
them, it's just slightly off-topic here I think.

> - does the test fail consistently since that commit or does it show
> unstable results?
> - does the test output show any traces of already known bugs?
> - has this regression been bisected and reported anywhere?
> - was the regression reported by anyone? If so, is there someone already
> working on it?

These are all part of the post-regression checks we've been
discussing to run as part of KernelCI. Basically, extending from
the current automated bisection jobs we have and also taking into
account the notion of dynamic scheduling. However, when
collecting data from other CI systems I don't think there is much
we can do if the data is not there. But we might be able to
create collaborations to run extra post-regression checks in
other CI systems to tackle this. For example, with TuxSuite
compatibility in KernelCI we could run CI tests for any system
that relies on TuxSuite.

> Many of these info points can be extracted from the CI results databases
> and processed to provide additional regression data. That’s what we’re
> trying to do with the Regression Tracker tool, and we think it’d be
> interesting to start experimenting with the data in KCIDB to see how
> this could be improved and what would be the right way to integrate this
> type of functionality.
>
> Please let us know if that's a possibility and if you'd like to add
> anything to the ideas proposed above.

Experimenting with KCIDB now may be interesting, but depending on
the outcome of the discussions around having one central database
for KernelCI it might not be the optimal way to do it. The
critical thing here is to stay in sync with developments in and
around KernelCI in general I think.

The new API is about to start its Early Access phase so we need
to stay focused on this for now, and then make sure we have a
reliable production deployment to replace the legacy system in
the coming months. Then the focus should start shifting towards
the more advanced features we'll be enabling with the new API and
we can have a more detailed KernelCI plan to cover this. There's
an "Advanced Features" milestone on the GitHub roadmap[3] for
that. Let's see if we can already do some preparation work by
discussing the topic there, in coordination with whatever extra
efforts you guys might be driving outside the KernelCI realm.

Thanks,
Guillaume

> [1] https://kernel.pages.collabora.com/kernelci-regressions-tracker/
> [2] https://linux-regtracking.leemhuis.info/regzbot/all/

[3] https://github.com/orgs/kernelci/projects/10/views/15