Re: [PATCH] i386: improve double fault handling

From: Ingo Molnar
Date: Mon Jul 28 2008 - 09:43:21 EST



* Jan Beulich <jbeulich@xxxxxxxxxx> wrote:

> >>> Ingo Molnar <mingo@xxxxxxx> 21.07.08 13:05 >>>
> >this still doesnt apply to latest -git. (or tip/master)
>
> Indeed, tip/master had a __pa -> __phys_addr_const conversion that I
> now sync-ed the patch with (without another round of testing):
>
> Make the double fault handler use CPU-specific stacks. Add some
> abstraction to simplify future change of other exception handlers to
> go through task gates. Add a new notification of the event through the
> die notifier chain, also providing some environmental adjustments so
> that various infrastructural things work independent of the fact that
> the fault and the callbacks are running on other then the normal
> kernel stack.
>
> Signed-Off-By: Jan Beulich <jbeulich@xxxxxxxxxx>
> Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
>
> ---
> arch/x86/kernel/cpu/common.c | 17 +++++--
> arch/x86/kernel/doublefault_32.c | 86 ++++++++++++++++++++++++---------------
> arch/x86/kernel/smpboot.c | 44 +++++++++++++++++++
> arch/x86/kernel/traps_32.c | 51 ++++++++++++++++++++++-
> drivers/lguest/segments.c | 3 -
> include/asm-x86/kdebug.h | 1
> include/asm-x86/processor.h | 7 ++-
> include/asm-x86/segment.h | 15 ++++--
> include/asm-x86/thread_info_32.h | 9 +++-
> 9 files changed, 187 insertions(+), 46 deletions(-)

I dont know.

All CPUs hitting a double fault simultaneously and corrupting each
others' kernel stack is a theoretical possibility - but is handling it
worth the complexity? It appears to me that a lock plus a short stub
function that takes the lock (with no stack usage) would handle that
much better.

Also, you seem to be setting things up to turn NMIs and MCEs into task
gates too, right?

So i'm really uneasy about all this. Breakage in such rarely used code
gets found very late, and has thus a high risk of losing debug
information when we need it the most. (i.e. it works in the exact
_opposite_ way of the intented goal of making things more robust - it
makes things less robust)

Firstly, 64-bit does not use a task gate for double faults anymore. (but
uses a separate IST stack for double faults)

Secondly, task gates are really a relic that should not be proliferated.
Besides the complications in virtualized environments (if more common
things like Big Real Mode are not supported well in virtual mode what do
we expect of more esoteric features as task gates?) it does not get
nearly as much testing on real silicon as other, more mainstream CPU
features.

Thirdly, NMI based profiling is quite common, so by turning NMIs into
task gates we'd slow that down quite a lot.

Also, the change to doublefault_fn is quite ugly - that inner block
should be split out into a separate function.

Plus the notifier - why do we care about that? It's not like we can
sanely kexec into a safe kernel from double faulting kernels in most
cases. In real cases where i've seen double faults it was due to us
corrupting kernel pagetables - kexec has no chance there. To recover
from that we'd have to set up the TSS with a safe(r) cr3 as well - but
your patch leaves _that_ untouched. (nor do we want to waste extra
unswappable memory on such remote possibilities i think)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/