Re: single bit flip detector.

From: Andrew Morton
Date: Fri Aug 04 2006 - 17:18:31 EST


On Tue, 1 Aug 2006 20:16:26 -0400
Dave Jones <davej@xxxxxxxxxx> wrote:

> In case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug. In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages
> in those cases, in the hope that users will try memtest before
> they report a bug.

Well boy, this has to be the most-reviewed patch ever. You'd think that
I'd apply it with great confidence and warm fuzzies.

However...


From: Andrew Morton <akpm@xxxxxxxx>

- one decl per line is more patching-friendly and a bit more idiomatic.

- make `bad_count' an int: a uchar might overflow

- Put a blank line between decls and code

- rename `total' to `error', remove `errors'.

- there's no need to sum up the errors.

- don't need to check for non-zero `errors': we know it is != POISON_FREE.

- make it look non-crapful in an 80-col window.

- add missing spaces in arithmetic

Cc: Dave Jones <davej@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---


diff -puN mm/slab.c~single-bit-flip-detector-tidy mm/slab.c
--- a/mm/slab.c~single-bit-flip-detector-tidy
+++ a/mm/slab.c
@@ -1637,11 +1637,13 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
- unsigned char total = 0, bad_count = 0, errors;
+ unsigned char error = 0;
+ int bad_count = 0;
+
printk(KERN_ERR "%03x:", offset);
for (i = 0; i < limit; i++) {
if (data[offset + i] != POISON_FREE) {
- total += data[offset + i];
+ error = data[offset + i];
bad_count++;
}
printk(" %02x", (unsigned char)data[offset + i]);
@@ -1649,11 +1651,13 @@ static void dump_line(char *data, int of
printk("\n");

if (bad_count == 1) {
- errors = total ^ POISON_FREE;
- if (errors && !(errors & (errors-1))) {
- printk(KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+ error ^= POISON_FREE;
+ if (!(error & (error - 1))) {
+ printk(KERN_ERR "Single bit error detected. Probably "
+ "bad RAM.\n");
#ifdef CONFIG_X86
- printk(KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
+ printk(KERN_ERR "Run memtest86+ or a similar memory "
+ "test tool.\n");
#else
printk(KERN_ERR "Run a memory test tool.\n");
#endif
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/