RE: [PATCH (v2 - fixup)] x86: Have debug/nmi restore the IDT itreplaced

From: Seiji Aguchi
Date: Mon Jun 03 2013 - 15:45:23 EST


Steven,

I think we can solve this problem simply by checking "current IDT"
rather than using a holding/restoring way.

Please see a patch set below.
[PATCH v13 0/3]trace,x86: irq vector tracepoint support
[PATCH v13 1/3] tracing: Add DEFINE_EVENT_FN() macro
[PATCH v13 2/3] trace,x86: Introduce entering/exiting_irq()
[PATCH v13 3/3] trace,x86: Add irq vector tracepoints

Seiji

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@xxxxxxxxxxx]
> Sent: Tuesday, May 28, 2013 9:32 AM
> To: Ingo Molnar
> Cc: H. Peter Anvin; LKML; Thomas Gleixner; Frederic Weisbecker; Andrew Morton; Seiji Aguchi; Borislav Petkov (bp@xxxxxxxxx)
> Subject: [PATCH (v2 - fixup)] x86: Have debug/nmi restore the IDT it replaced
>
> I've been playing with Seiji's patches:
>
> https://lkml.org/lkml/2013/1/21/573
>
> Which add tracing to the x86 interrupt vectors. But to keep tracing
> overhead down to zero when tracing is inactive, it uses a swap of the
> IDT to replace the irq vectors with versions that have tracepoints in
> them, when tracing is enabled. Even though tracepoints have "nops", we
> do this to keep even that overhead eliminated.
>
> These seemed to work, until I ran the following command with trace-cmd:
>
> trace-cmd start -p function_graph -g "smp_trace_apic_timer_interrupt" \
> -g "smp_apic_timer_interrupt" -e irq_vectors
>
> What trace-cmd does above, is not only enables the irq_vectors, but also
> produces the call graphs of the two interrupt vector functions:
> smp_trace_apic_timer_interrupt and smp_apic_timer_interrupt
>
> The result was rather surprising. It showed only
> smp_apic_timer_interrupt being called, and not the trace version.
> Obviously, the tracepoints for the vectors were also not there.
>
> When starting without the function graph tracer, it worked fine, but I
> wanted to see the trace versions being called to be sure, which required
> one of the function tracers.
>
> Investigating, I found the problem. It's with the NMI and breakpoint IDT
> handling. I wont explain it too much here, but see:
>
> commit f8988175f "x86: Allow nesting of the debug stack IDT setting"
> commit 42181186a "x86: Add counter when debug stack is used with
> interrupts enabled"
> commit 228bdaa95 "x86: Keep current stack in NMI breakpoints"
>
> The basic gist of the above commits is that on NMI or DEBUG trap
> entering, it needs to modify the IDT so that the stack pointer doesn't
> get reset to the top of the stack again. On boot up, two IDT tables are
> created. One for normal operations and one for this NMI/DEBUG case.
>
> The issue is that it just toggles between the two IDT tables. But if
> this happens when Seiji's swap had already happened, it replaces the
> trace IDT table back to the normal table, and tracing stops, which sorta
> defeats the purpose.
>
> To solve this, I've added a 'hold_idt_descr' per cpu variable that
> stores the IDT that was loaded and will use that to replace it. If the
> DEBUG/NMI came in when the IDT was normal, it would replace it back with
> the normal IDT, and if it came in when it was the trace IDT, it would
> put back the trace IDT.
>
> I've run a few tests so far on this code, but I need to run more
> stressful ones. Meanwhile, until I find any bugs, I'm posting this patch
> for your enjoyment. I think I got all the cases, as NMIs causes the
> store/restore functions to be re-entrent without any locks.
>
> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/common.c | 79 ++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 22018f7..1315367 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1149,10 +1149,54 @@ int is_debug_stack(unsigned long addr)
> }
>
> static DEFINE_PER_CPU(u32, debug_stack_use_ctr);
> +static DEFINE_PER_CPU(struct desc_ptr, hold_idt_descr);
> +static DEFINE_PER_CPU(struct desc_ptr, copy_idt_descr);
>
> +/*
> + * Debug traps and NMIs will swap the IDT to have the debug
> + * trap not modify the stack (nmi_idt_descr). But as both the
> + * debug and NMIs share this, they need to be re-entrant. A debug
> + * trap could be doing the swap and after it incs the debug_stack_use_ctr,
> + * an NMI could come in. It too would need to do the swap, and it would
> + * need to swap every time.
> + *
> + * But, the IDT can be changed by other users, and this must restore
> + * that as well.
> + *
> + * Luckily, as interrupts are disabled from the set_zero to reset,
> + * we do not need to worry about the IDT changing underneath us
> + * from other users.
> + */
> void debug_stack_set_zero(void)
> {
> + /*
> + * Writing to the IDT is atomic. If an NMI were to come
> + * in after we did the compare but before the store_idt(),
> + * it too would see the address == 0 and do the store itself.
> + */
> + if (this_cpu_read(hold_idt_descr.address) == 0)
> + store_idt(this_cpu_ptr(&hold_idt_descr));
> +
> + /*
> + * If an NMI were to come in now, it would not set hold_idt_descr,
> + * but on exit, it would restore the IDT because the counter is
> + * still zero here. Then it would set the .address back to zero too.
> + */
> +
> this_cpu_inc(debug_stack_use_ctr);
> +
> + /*
> + * NMI's coming in now are not an issue as we set the .address
> + * and also incremented the ctr. But, if an NMI came in before
> + * the counter was incremented, and after the .address was set,
> + * the NMI would set the .address back to zero, and would have
> + * restored the IDT. We need to check if that happened.
> + * If it did, then the .address would be zero here again.
> + */
> + if (unlikely(this_cpu_read(hold_idt_descr.address) == 0))
> + store_idt(this_cpu_ptr(&hold_idt_descr));
> +
> + /* On enter, we always want to use the nmi_idt_descr */
> load_idt((const struct desc_ptr *)&nmi_idt_descr);
> }
>
> @@ -1160,8 +1204,39 @@ void debug_stack_reset(void)
> {
> if (WARN_ON(!this_cpu_read(debug_stack_use_ctr)))
> return;
> - if (this_cpu_dec_return(debug_stack_use_ctr) == 0)
> - load_idt((const struct desc_ptr *)&idt_descr);
> +
> + if (WARN_ON(!this_cpu_read(hold_idt_descr.address)))
> + return;
> +
> + /*
> + * This is the tricky part. We need to restore the old
> + * IDT to what it was before we entered, but an NMI could
> + * come in at any point, and do the same.
> + *
> + * If the count is 1, then we are the first to enter and
> + * we need to update the IDT. But, we must do that after
> + * we decrement the counter, in which case, if an NMI
> + * comes in, it too will see the 1. To get around this,
> + * we update a copy first.
> + *
> + * The copy will always contain what we want to load the
> + * IDT with.
> + */
> + if (this_cpu_read(debug_stack_use_ctr) == 1)
> + *this_cpu_ptr(&copy_idt_descr) = *this_cpu_ptr(&hold_idt_descr);
> +
> + if (this_cpu_dec_return(debug_stack_use_ctr) == 0) {
> + /*
> + * We use the copy to save to the IDT, as it will always contain
> + * what we want to restore the IDT to.
> + */
> + load_idt(this_cpu_ptr(&copy_idt_descr));
> + /*
> + * Now clear out the hold_idt_descr.address, to let all new
> + * users restore it from the IDT.
> + */
> + this_cpu_write(hold_idt_descr.address, 0);
> + }
> }
>
> #else /* CONFIG_X86_64 */
> --
> 1.7.3.4
>
>

¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_