Re: 2.4.22-pre lockups (now decoded oops for pre10)

From: Chris Mason
Date: Thu Aug 14 2003 - 21:13:36 EST


On Thu, 2003-08-14 at 13:42, Stephan von Krawczynski wrote:

> Hello Marcelo,
>
> the system is up and running, currently:
>
> 7:40pm up 4 days 2:34, 21 users, load average: 2.07, 2.10, 2.06
>
> there is still the verification issue, today I added another 50 GB to the data
> stream, and therefore got additional 3 verification errors. But this seems to
> have no influence on the stability. Box feels ok, reacts completely normal, no
> strange output in any logs.

Just to second Oleg's messages so far, the verification issues are still
serious, it could be the same kind of memory corruptions that could be
causing crashes on reiserfs, just in a different place.

We need to find out if a specific kernel release is causing these
corruptions. There are lots of different ways to go about it, I would
suggest a combination of fsx (triggers IO and does verification) and
usemem (sucks down ram) from the ext3 cvs progs.

When you can reliably cause either fsx-linux errors or system hangs in a
short period of time, then we can try different prereleases to find the
offending code.

(download details here: http://www.zipworld.com.au/~akpm/linux/ext3/)

Run 4 or so fsx-linux programs (each to its own file) and use usemem to
put your box into swap. That should hit it pretty quickly, and any
errors from fsx indicate problems.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/