Re: Reporting bugs and bisection

From: david
Date: Sun Apr 13 2008 - 19:56:37 EST


cross-posted to git for the suggestion at the bottom

On Sun, 13 Apr 2008, Stephen Clark wrote:

Evgeniy Polyakov wrote:
On Sun, Apr 13, 2008 at 10:33:49PM +0200, Rafael J. Wysocki (rjw@xxxxxxx) wrote:
Things like this are very disappointing and have a very negative impact on bug
reporters. We should do our best to avoid them.

Shit happens. This is a matter of either bug report or those who were in
the copy list. There are different people and different situations, in
which they do not reply.

Well less shit would happen if developers would take the time to at least test their patches before they were submitted. It like we will just have the poor user do our testing for us. What kind of testing do developers do. I been a linux user and have followed the LKML for a number of years and have yet to see
any test plans for any submitted patches.

I've been reading LKML for 11 years now, I've tested kernels and reported a few bugs along the way.

the expectation is that the submitter should have tested the patches before submitting them (where hardware allows). but that "where hardware allows" is a big problem. so many issues are dependant on hardwre that it's not possible to test everything.

there are people who download, compile and test the tree nightly (with farms of machines to test different configs), but they can't catch everything.

expecting the patches to be tested to the point where there are no bugs is unreasonable.

bisecting is a very powerful tool, but I do think that sometimes developers lean on it a bit much. taking the attitude (as some have) that 'if the reporter can't be bothered to do a bisection I can't be bothered to deal with the bug' is going way too far.

if a bug can be reproduced reliably on a test system then bisecting it may reveal the patch that introduced or unmasked the bug (assuming that there aren't other problems along the way), but if the bug takes a long time to show up after a boot, or only happens under production loads, bisecting it may not be possible. that doesn't mean that the bug isn't real, it just means that the user is going to have to stick with an old version until there is a solution or work-around.

even in the hard-to-test situations, the reporter is usually able to test a few fixes, but there's a big difference between going to management and saying "the kernel guru's think that this will help, can we test it this weekend" 2-3 times and doing a bisection that will take 10-15 cycles to find the problem.

it's very reasonable to ask the reporter if they can bisect the problem, but if they say that they can't, declaring that they are out of luck is not reasonable, it just means that it's going to take more thinking to find the problem instead of being able to let the mechanical bisect process narrow things down for you. it may mean that the developer will need to make a patch to instrament an old (working) kernel that has minimal impact on that kernel so that the reporter can run this to gather information about what the load is so that the developer can try to simulate it on a new (non-working) kernel

in theory everyone has a test environment that lets them simulate everything in their production envrionment. in practice this is only true at the very low end (where it's easy to do) and the very high end (where it's so critical that it's done no matter how much it costs). Everyone else has a test environment that can test most things, but not everything. As such when they run into a problem they may not be able to do lots of essentially random testing.

elsewhere in this thread someone said that the pre-git way was to do a manual bisect where the developer would send patches backing out specific changes to find the problem. one big difference between tat and bisecting the problem is that the manual process was focused on the changes in the area that is suspected of causing the problem, while the git bisect process goes after all changes. this makes it much more likely that the tester will run into unrelated problems along the way.

I wonder if it would be possible to make a variation of git bisect that only looked at a subset of the tree when picking bisect points (if you are looking for a e1000 bug, testing bisect points that haven't changed that driver won't help you for example). If this can be done it would speed up the reporters efforts, but will require more assistance from the developers (who would need to tell the reporters what subtrees to test) so it's a tradeoff of efficiancy vs simplicity.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/