Re: Insights from doing regression tracking for Linux 4.7

From: Rafael J. Wysocki
Date: Mon Aug 01 2016 - 22:27:06 EST


On Monday, August 01, 2016 09:41:53 PM Thorsten Leemhuis wrote:
> In case anyone wonders if I regret doing regression tracking for Linux
> 4.7: No, that is not the case. It isn't really fun, but well, I didn't
> expect it to be ;-) But FWIW, find below a few thoughts about the whole
> regression tracking thing I thought might be good to write down and
> share while they are still fresh in my head.
>
> The TLDR version: I currently think it would help a lot to have
> something like patchwork (https://github.com/getpatchwork/patchwork )
> that is able to track regressions instead.

Well, I had the same idea after I had started to track regressions, but
that turned out to be unrealistic.

The main issue here is that there are multiple places where regressions are
reported and you'd need to subscribe your tool to all of them.

Even if that's doable in practice, they all tend to use different data formats,
engines, conventions etc. and your tool would need to understand all that.

> This is just a vague idea of mine right now I plan to investigate
> further into sooner or later (I'm just coming back from a holiday and
> have various conference talks to prepare -- besides doing my real job
> and regression tracking for 4.8).
>
> Here is the long version with the details that got me thinking into
> above mentioned direction (in no particular order):
>
> * Some regression tracking work (track date and place of initial
> report; are things proceeding, compile a report at least once a week,
> ...) feel time consuming and boring, because most of it is mechanical
> work a computer should do, as that's what they were designed for ;-)

IMO some manual work related to this is unavoidable, unfortunately.

At least you need to put the initial report (date, link etc) into a data
base of some sort.

> * My work afaics helped to get a few regressions fixed that otherwise
> might have been forgotten; but quite a few times I noticed pull request
> mentioning fixes for regression that I had not been aware of; I wonder
> how many regressions are still out there because both me and the
> relevant subsystem maintainer missed them.

Probably quite a few.

All it takes is to report a regression to the LKML only without CCing the
relevant maintainer and without saying that it is a regression anywhere
in the message subject.

> * To make regression tracking really work well the workload must be
> spread among multiple people. It would afaics be best to get developers
> and maintainers involved more and give them something at hand to make
> them track regressions in their area of work. But to make that happen we
> need to have some kind of regression tracking software somewhere; and to
> make developers and maintainers actually use it is has to make their
> life easier somehow (similar to how patch tracking in patchwork makes
> things easier at least for some people).

Agreed.

> * It's hard to get aware of all regression, as they are reported to
> various places (LKML, bugzilla.kernel.org, as well as lots of other
> mailing lists and a few other bugzillas). So a human or automatic
> regression tracker need to get told about regressions. That currently
> only sometimes happens for various reasons (I'm new to this; some people
> prefer if their regressions are not mentioned in the spotlight; some
> developers don't like to deal with bugzilla; ...). An email alias or a
> dedicated mailing might be a step in the right direction; a better
> solution might be a computer program that semi-automatically picks up
> regressions in those places where they are reported currently.
>
> * Manually created regression reports (that's how I did them) sometimes
> quickly get out of date; a more automated solution that compiles and
> publishes up2date reports on the web would be better.
>
> * bugzilla.kernel.org is afaics not a good solution when it comes to
> track regressions that are reported in other places. Main reason: Make
> someone (maintainers or someone that tracks regression) create and
> update bug entries for regressions mentioned in other places is a
> solution that to me seems unlikely to fly due to the overhead bugzilla
> has. Yes, Rafael did it like that when he did regression tracking, but
> I'm currently think that time is better spend elsewhere; and having one
> issue discussed in two places can quickly lead to confusion. Another
> reason why bugzilla.kernel.org is afaics not a good solution: it
> sometimes confusing to have tracked regressions and new, not yet
> verified regressions in one place.

I just used the BZ as a data base with a Web interface that could be scripted
around.

I didn't see much confusion related to having regression entries created by me
in the BZ. They were clearly distinct from normal BZ reports and that did the
trick I think.

One benefit was that if a regression was originally reported in the BZ, it
was trivial to add it to the list.

> Those were my main thoughts on the whole thing; here are a few other:
>
> * I'll try to do regression tracking for the 4.7 stable series (until
> 4.8 comes out), but doing it for mainline at least for 4.8 (not sure if
> I can continue my work after that point) has higher priority for now, as
> weeks only have 168 hours :-/
>
> * For now I won't get into regression tracking for longterm kernels, as
> the time afaics is better spend on getting regression tracking on a more
> solid track.
>
> * Right now I only poke people manually when I think something needs to
> be done to get things rolling again. Rafael sent automatic reminder
> mails (together with the weekly reports I think). Maybe that is
> something I should do, too; but OTOH it can quickly result in people
> ignoring those mails.

As a maintainer, I would not ignore them. To me, personally, it would be
useful to receive them in case I overlooked something somewhere.

> * I should have helped to get
> https://git.kernel.org/torvalds/c/262e2bfd7d1e1f1ee48b870e5dfabb87c06b975e
> more quickly to mainline as the commit that introduced the regression
> already had made it to the stable and broke things there for a few weeks
> (and there was an earlier patch from jthumshirn that should have fixed
> the issue, too) :-/
>
> * I had hoped to spend more time helping users to (a) identify if their
> problems actually are regressions and (b) getting their regressions
> heard by the right people. :-/
>
> * I didn't do any statistics yet on how much my work helped (some
> regressions I noticed were added and removed from my list between two
> reports and thus for outsiders never really were on the reports); for
> now that's not worth the work afaics.

This touches on a problem that has been there all the time IMO.

Some regressions are fixed really quickly if they are reported, say, during
a merge window in which they had been introduced.

Say, a regression introduced by one of the recent merges is reported today
and there's a fix available tomorrow and it goes into the mainline the day
after.

I'm not really sure it's worth spending time on tracking such things to be
honest.

> * Something I avoided doing for now, because it's afaics quite a bit of
> work: make bugzilla work better. It works well for some issues, but not
> at all for others (most are somewhere in between). I for example suspect
> the list of default assignees needs a major overhaul.
>
> * Something else I'd like to do some when: Make it easier for novice
> users to test for regressions. That among other means: improve the
> documentation so people that want to test find their way in; for example
> create docs that explain "how to test mainline rc kernels on the major
> distributions easily and without risk", "how to use localmodconfig" or
> "bisect regressions for dummies" in a central place would be nice to
> have. And maybe even form groups of people that do regression testing
> regularly; ideally together with Arch Linux, Fedora, openSUSE
> Tumbleweed, and other distros that regularly ship the latest Linux
> versions (and thus have a interest that new mainline versions do not
> have to many regressions).

That's a good idea IMO.

All in all, thanks a lot for doing this work!

Rafael