RE: [PATCH v2] [LBR] Dump LBRs on Exception

From: Berthier, Emmanuel
Date: Tue Dec 02 2014 - 14:38:45 EST


> From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
> Sent: Friday, November 28, 2014 4:15 PM
> To: Berthier, Emmanuel
> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>
> On Fri, Nov 28, 2014 at 12:44 AM, Berthier, Emmanuel
> <emmanuel.berthier@xxxxxxxxx> wrote:
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index df088bb..f39cded 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -1035,6 +1035,46 @@ apicinterrupt IRQ_WORK_VECTOR \
> > irq_work_interrupt smp_irq_work_interrupt #endif
> >
> > +.macro STOP_LBR
> > +#ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
> > + testl $3,CS+8(%rsp) /* Kernel Space? */
> > + jz 1f
> > + testl $1, lbr_dump_on_exception
>
> Is there a guarantee that, if lbr_dump_on_exception is true, then LBR is on?
> What happens if you schedule between stopping and resuming LBR?

Good point. The current assumption is to rely on the numerous exceptions to "re-arm" the LBR recording.
Even if we bypass UserSpace page faults, we can keep rely on kernel VMalloc page faults to re-arm the recording.

> > + jz 1f
> > + push %rax
> > + push %rcx
> > + push %rdx
> > + movl $MSR_IA32_DEBUGCTLMSR, %ecx
> > + rdmsr
> > + and $~1, %eax /* Disable LBR recording */
> > + wrmsr
>
> wrmsr is rather slow. Have you checked whether this is faster than just
> saving the LBR trace on exception entry?

The figures I have show that for common MSR, rdmsr and wrmsr have quite the same impact, around 100 cycles (greatly depends on the arch).
The cost of stop/start is: 2 rdmsr + 2 wrmsr = 4 msr
The cost of reading LBR is: 1 rdmsr for TOS + 2 rdmsr per record, and there are from 8 to 32 Records (Arch specific) = between 17 to 65 msr.
I've measured on Atom arch (8 records): LBR read versus stop/start: around x3 more time.
As the LBR size is arch dependent, it's not easy to implement the record reading in ASM without any branch, and this would generate maintenance dependency.
I prefer to let perf_event_lbr dealing with all that stuff.

Thx,

Emmanuel.