Re: Simple script that locks up my box with recent kernels

From: Linus Torvalds
Date: Sat Oct 07 2006 - 17:31:14 EST

Next message: Hugo Vanwoerkom: "sound skips on 2.6.18-ck1"
Previous message: Adrian Bunk: "Re: 2.6.19-rc1 SMP x86_64 boot hungs up"
In reply to: Jesper Juhl: "Re: Simple script that locks up my box with recent kernels"
Next in thread: Grant Coady: "Re: Simple script that locks up my box with recent kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
> > Can I bother you to just bisect it?
>
> Sure, but it will take a little while since building + booting +
> starting the test + waiting for the lockup takes a fair bit of time
> for each kernel

Sure. That said, we've tried to narrow down things that took hours or days
(under real loads, not some nice test-script) to reproduce, and while it
doesn't always work, the real problem tends to be if the problem case
isn't really reproducible. It sounds like yours is pretty clear-cut, and
that will make things much easier.

> and also due to the fact that my git skills are pretty
> limited, but I'll figure it out (need to improve those git skills
> anyway) :-)

"git bisect" in particular isn't that hard to use, and it will really do
a lot of heavy lifting for you.

Although since it will just select a random commit (well, it's not
"random": it's strictly as half-way as it can possibly be, but it's
automated without any regard for anything else), you can sometimes hit a
situation where git will ask you to test a kernel that simply doesn't work
at all, and you can't even test whether it reproduces your particular bug
or not.

For example, "git bisect" might pick a kernel that just doesn't compile,
because of some stupid bug that was fixed almost immediately afterwards.
In those cases, the total automation of "git bisect" ends up being
something that has to be helped along by hand, and then it definitely
helps to know more about how git works.

Anyway, the quick tutorial about "git bisect" is that once you've given it
the required first "good" and "bad" points, it will create a new branch in
the repository (called "bisect", in case you care), and after that point
it will do a search in the commit DAG (aka "history tree" - it's not a
tree, it's a DAG, since merges will join branches together) for the next
commit that will neatly "split" the DAG into two equal pieces. It will
keep splitting the commit history until you get fed up, or until it has
pinpointed the single commit that caused the problem.

The nicest tool to use during bisection is to just do a

git bisect visualize

that simply starts up "gitk" (the default git history visualizer) to show
what the current state of bisection is. Now, if there are thousands and
thousands of commits, you'll have a really hard time getting a visual clue
about what is going on, but especially once you get to a smaller set of
commits, it's very useful indeed.

And it's _especially_ useful if you hit one of the problem spots where you
can't test the resulting tree for some unrelated reason. When that
happens, you should _not_ mark the problematic commit as being "bad",
because you really don't know - the "badness" of that commit is probably
not related to the "badness" that you're actually searching for.

Instead, you should say "ok, I refuse to test this commit at all, because
it's got other problems, and I will select another commit instead". The
bisection algorithm doesn't care which commit you pick, as long as it's
within the set of "unknown" commits that you'll see with the visualization
tool.

Of course, for efficiency reasons, the _closer_ you get to the half-way
mark, the better. So it's useful to try to pick a commit that is close to
the one that "git bisect" originally chose for you, but that's not a
correctness issue, that's just an issue of "if we have a thousand
potential commits, we're better off bisecting it 400/600 rather than
1/999, even if the exact half-way point isn't testable".

So if you need to decide to pick another point than the one "git bisect"
chose for you automatically, just select that commit in the visualizer
(which will cut the SHA1 name of it), and then do

git reset --hard <paste-sha1-here"

to reset the "bisect" branch to that point instead. And then compile and
test that kernel instead (and then if that's good or bad, you can do the
"git bisect good" or "git bisect bad" thing to mark it so, and git will
continue to bisect the set of commits).

It can be a bit boring, but damn, it's effective. I've used "git bisect"
several times when I've been too lazy to try to really think about what is
going on - I'll happily brute-force bug-finding even if it might take a
little longer, if it's guaranteed to find it (and if the bug is
reproducible, git bisect definitely guarantees to find what made it
appear, even if that may not necessarily be the deeper _cause_ of the bug)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hugo Vanwoerkom: "sound skips on 2.6.18-ck1"
Previous message: Adrian Bunk: "Re: 2.6.19-rc1 SMP x86_64 boot hungs up"
In reply to: Jesper Juhl: "Re: Simple script that locks up my box with recent kernels"
Next in thread: Grant Coady: "Re: Simple script that locks up my box with recent kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]