Re: [Announce] Linux Test Project

From: Nathan Straz (nstraz@sgi.com)
Date: Thu Aug 10 2000 - 14:12:32 EST


On Thu, Aug 10, 2000 at 11:35:03AM -0400, David Mansfield wrote:
> One question: how is the framework going to handle tests which cause
> pathological behavior in the kernel. For example, 'infinite' hangs in
> the MM system during OOM, or crashes (OOPSes, panics) or deadlocks
> (process stuck in 'D' state). Most often these are the results of the
> tests I tend to run.

That's one of the most imporant things we need to discuss. We
definately need to build into the framework a way to recover from
problems like this. If this will be some type of automated reboot, or
someone walking in a rebooting the machine manually, I don't know. My
goals are to get the system as automated as possible. It may turn out
that we will include these tests as manual tests for completeness. We
would like to discuss any ideas people have.

> Any testbed needs to handle these situations, at least somewhat. For
> example, if I leave a 12 hour regression going and leave the console,
> and I come back to see the system has rebooted itself, or is stuck solid
> - how do I know which test has 'failed'?
>
> Ideally, in the case where something bad but not as fatal as complete
> lockup happens, the test bed should diagnose the fault. This could be
> capturing the oops, or running ps -alx to see where it's stuck.

Capturing data is definately going to be a priority for the system.
Besides capturing the crash dumps, the hardest thing to record will be
the complete configuration of the system. Maybe some combination of
data from /proc will be enough.

> In all the cases where I've tested the kernel, it has been a matter of
> 'Here is the test which crashes the kernel, let's see if it crashes this
> one, too' and I just want to throw this into the consideration of how to
> best create a test harness.

How small do you make these tests? Do they directly attack the problem
or are they more like "after about an hour this usually crashes?" We
definately want anything that directly exploits a bug.

Nate

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Aug 15 2000 - 21:00:22 EST