Re: kernel BUG at mm/rmap.c:483!

From: Hugh Dickins
Date: Thu Feb 24 2005 - 00:32:56 EST


On Wed, 23 Feb 2005, Ammar T. Al-Sayegh wrote:
> ----- Original Message ----- From: "Hugh Dickins" <hugh@xxxxxxxxxxx>
> > though quite possibly you cannot afford
> > such experiments on this server, and will revert to 2.4 for now.
>
> The problem is that my server is already in production
> mode. I'm running great portion of my business on it,
> where there is very little tolerance for downtime.

I feared as much.

> Because the server is located in a remote datacenter,
> every time it goes down it takes several hours to have
> someone sent up there to manually reboot it for a hefty
> emergency fee. So this bug has already cost me a lot of
> money, and I'm worried that it will cost me a lot of my
> clients as well if it persists.

I'm very sorry for that.

> Remote hands are rather expensive, so it will cost me
> $100/hr to have someone runs memtest86 on my server
> since I can't perform it remotely. I'll do it though
> since that's your recommendation for the time being.
> Hope it will not take more than an hour to run the
> test, and hope it turns out as bad memory modules as
> you expect because I hate to downgrade after all the
> time and money I expended on the upgrade.

One hour will be enough if it does find a problem in that time,
worth a shot; but not enough to give confidence in the memory
if it does not find one, 12 hours better. I actually wonder
whether rmap.c:483 is the best memory tester (serious answer
would be, in some cases yes, but not in all).

Do let me know. If I can find time to rejig the debug patch
against your kernel, it would itself keep your server running,
replacing the BUG_ON by printks and safety. But without knowing
what it will report, I can't judge how satisfactory that would
be (and it's unlikely to lead us to the final answer in one go).

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/