Re: bio linked list corruption.

From: Andy Lutomirski
Date: Tue Oct 18 2016 - 21:06:10 EST


On 10/18/2016 05:10 PM, Linus Torvalds wrote:
On Tue, Oct 18, 2016 at 4:42 PM, Chris Mason <clm@xxxxxx> wrote:

Seems to be the whole thing:

Ahh. On lkml, so I do have it in my mailbox, but Dave changed the
subject line when he tested on ext4 rather than btrfs..

Anyway, the corrupted address is somewhat interesting. As Dave Jones
said, he saw

list_add corruption. prev->next should be next (ffffe8ffff806648),
but was ffffc9000067fcd8. (prev=ffff880503878b80).
list_add corruption. prev->next should be next (ffffe8ffffc05648),
but was ffffc9000028bcd8. (prev=ffff880503a145c0).

and Dave Chinner reports

list_add corruption. prev->next should be next (ffffe8ffffc02808),
but was ffffc90005f6bda8. (prev=ffff88013363bb80).

and it's worth noting that the "but was" is a remarkably consistent
vmalloc address (the ffffc9000.. pattern gives it away). In fact, it's
identical across two boots for DaveJ in the low 14 bits, and fairly
high up in those low 14 bots (0x3cd8).

DaveC has a different address, but it's also in the vmalloc space, and
also looks like it is fairly high up in 14 bits (0x3da8). So in both
cases it's almost certainly a stack address with a fairly empty stack.
The differences are presumably due to different kernel configurations
and/or just different filesystems calling the same function that does
the same bad thing but now at different depths in the stack.

Adding Andy to the cc, because this *might* be triggered by the
vmalloc stack code itself. Maybe the re-use of stacks showing some
problem? Maybe Chris (who can't see the problem) doesn't have
CONFIG_VMAP_STACK enabled?

Wouldn't this cause the exact opposite problem? If the warning is to be believed, then prev is *not* on the stack but somehow prev->next ended up pointing to the stack. If stack reuse caused something to corrupt a value on the stack, then how would this cause a stack address to be written to a non-stack location? All I can think of is that "prev" itself is corrupted somehow.

One possible debugging approach would be to change:

#define NR_CACHED_STACKS 2

to

#define NR_CACHED_STACKS 0

in kernel/fork.c and to set CONFIG_DEBUG_PAGEALLOC=y. The latter will force an immediate TLB flush after vfree.

Also, CONFIG_DEBUG_VIRTUAL=y can be quite helpful for debugging stack issues. I'm tempted to do something equivalent to hardwiring that option on for a while if CONFIG_VMAP_STACK=y.