2.6.13-rc2-almost-rc3, ext3 and memory corruption

From: Petr Vandrovec
Date: Fri Sep 30 2005 - 22:01:16 EST


Hello,
I was so happy that 2.6.13-rc3 was released that I tried to grab it,
and while I was doing that, box said:

Slab corruption: start=ffff8100005e5bf8, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 01 00 00 00 6b 6b 6b 6b
Prev obj: start=ffff8100005e5b80, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=ffff8100005e5c70, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

It looks like that journal head's b_jcount was set to 1 after
journal head was already released. It looks like that I must
have been extremelly unlucky, as all b_jcount operations I
can find are either jh->b_jcount++ or --jh->b_jcount, so
schedule() (and free()) must have happened after b_jcount was
read from memory, but before incremented value was written
back (or I have missed some direct assignment, but I see none).
On other side all these b_jcount manipulations should be under
jbd_lock_bh_journal_head, which both disables preemption and
acquires some bitlock. So I'm again missing something fundamental,
as this use-after-free should never happen, and if it could
happen I should get 6c 6b 6b 6b and not 01 00 00 00...

Box has two ext3 filesystems, and when these oopses occured I
was running one apt-get install in /, apt-get update in the /pure64
chroot, apt-get source apt in /usr/src/debian (but maybe this one
was already finished at that time) and cg-export /tmp/linux-2.6.13-rc3
on fourth console. git's kernel repository is on second disk, everything
else on first one. So disks and ext3 were quite busy when problem
occured.

I'll try to reproduce it with 2.6.13-rc3 (running kernel was from
wednesday git, as of commit 95001ee...), but I have no idea how
reproducible it is, and as I see no ext3 or jbd changes since wednesday
I believe it is not fixed yet.

Box in question has single Athlon64 processor, runs SMP PREEMPT kernel
with almost all debugging options set (but not CONFIG_JBD_DEBUG). Box
has 2GB of ECC RAM. Root filesystem is on pata disk connected to the
pata port on Promise's 20378 (using sata_promise). Disk with git repository
is plugged to pata port on Via's VT8237. Kernel's configuration is
at http://platan.vc.cvut.cz/vana_at_home_config.txt, dmesg at
http://platan.vc.cvut.cz/vana_at_home_dmesg.txt.

Thanks,
Petr Vandrovec
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/