Re: Fake ext3 corruption on raid5 in 2.6.11.9 smp

From: Olivier Guerrier
Date: Thu May 26 2005 - 21:26:21 EST


First, I just realized the typo in my previous message's subject:
Should read 2.6.*11.*9 smp, instead of 2.6_.9. Sorry :|

jpearson wrote:
Hi,

I saw this exact same error (EXT3-fs error (device dm-x): ext3_readdir:
bad entry in directory #nnnnnnn: rec_len % 4 != 0 - offset=0, inode=xxxxxxxx,
rec_len=...

from time to time in my non-SMP RAID system with 512Mb RAM, with ext3 on LVM on top of RAID5.

Never caused actual corruption - run FSCK, no errors, remount rw
successfully until next time; error rarely in the same place, but always
in a directory and rec_len % 4 != 0. Looks like an 'in-kernel' thing,
because (e.g.) running find on the volume after remounting rw produced
no issues, so presumably the on-disk directory wasn't *really* the
issue.

I confirm this here too: random place, always a dir, always 'rec_len % 4 != 0', no fs issue or data loss (so far...)

Filesystems between about 8 and 50 Gb, and not what I'd characterise as a
heavy load.

By heavy load, I mean a system load between 10 and 15 for 3 hours (before error) Processes running were several instances of mkisofs (reading from and writing to the faulty partition)

This was with about 2.6.4 - 2.6.7. I'm running 2.6.11 now and haven't
seen it in some time; so it was either fixed by 2.6.11, or mounting ro
by default has just reduced my exposure.

As my kernel is a 2.6.11.9, It is not fixed so far.

I will reformat when possible, this time I will use lvm over raid5, so I can use xfs for my usefull data, and keep a medium ext3 partition to make tests if needed (just need to know what to test)

Thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/