Re: Kernel Panics in the network stack

From: Russell King - ARM Linux
Date: Tue Dec 22 2009 - 06:25:42 EST


On Tue, Dec 22, 2009 at 11:08:25AM +0000, Catalin Marinas wrote:
> On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
> > I found an old commit mentioning a problem with LDM instruction that
> > could be interrupted/ restarted with a base register already changed
> > -> we load registers with garbage.
> [...]
> > If the low interrupt latency mode is enabled for the CPU (from ARMv6
> > onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
> > restoring the sp and pc registers can be interrupted immediately after sp
> > was updated but before the pc. If this happens, the CPU restores the base
> > register to the value before the ldm instruction but if the base register
> > is not sp, the interrupt routine will corrupt the stack and the restarted
> > ldm instruction will load garbage.
> [...]
> > I found one instance of LDM instruction in 2.6.30 that could have same problem :
> >
> > __switch_to:
> >
> > ...
> > ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
>
> It looks to me like it is possible to get an interrupt after SP was
> loaded but before PC, the stack could be corrupted and PC would be
> loaded with garbage. One instance of your oops messages looks like PC
> corruption but the other may be caused by something else. What ARM CPU
> are you using?
>
> I'm cc'ing Russell as well, it's strange that we haven't got any issue
> with this so far.

We don't see the issue because we explicitly disable low latency
interrupt mode.

> You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
> arch/arm/include/asm/system.h as a sanity check for your aborts.

Unfortunately, we can't do that for older ARM architectures without
severely impacting the interrupt latency there. Not only that, but
the interrupt latency will be increased during any context switch.

I really question the value of this "low latency interrupt" setting.
If you're worried about interrupts being disabled for a very small
number of bus cycles for a LDM, then you're going to be screaming
merry hell about the places in the kernel where interrupts are masked.
The two just do not go together.

The only case for enabling the low latency interrupt mode would be if
you have tightly controlled software which never disables interrupts.
Linux does not fall into that category, so enabling it is pointless
and causes unnecessary problems.

Given that, the simple and obvious solution is: do not modify the kernel
to enable low interrupt latency mode.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/