Re: [PATCH v2] [LBR] Dump LBRs on Exception

From: Robert Jarzmik
Date: Sat Dec 06 2014 - 05:31:28 EST


Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:

> I don't really care about the number of instructions.
Right, a couple of test/jz/jnz is negligible in the exception path, that's what
I also think.

> But there are still all the nasty cases:
>
> - Context switch during exception processing (both in the C handler
> and in the retint code).
> - PMI during exception processing.
> - Exception while perf is poking at LBR msrs.

Yes.
Wasn't that what Thomas's suggestion on the per-cpu variable was solving ?
Ie:
DEFINE_PER_CPU(unsigned long, lbr_dump_state) = LBR_OOPS_DISABLED;
...

We would have a "LBR resource" variable to track who owns the LBR :
- nobody : LBR_UNCLAIMED
- the exception handler : LBR_EXCEPTION_DEBUG_USAGE
- activated with a runtime variable or config
- impossible to activate if perf has hold of it
- the perf code : LBR_PERF_USAGE
- activated through perf infrastructure
- impossible to activated if exception handler has hold of it

Now this solves the perf/exception concurrency on the LBR registers. If there is
a rescheduling during the exception, or a PMI, can that have an impact ?
- case 1: nobody is handling LBR
=> no impact, expception handlers won't touch LBR
- case 2: perf is handling LBR
=> no imppact, exception handler won't touch LBR

- case 3: exception handlers are handling LBR

- case 3a: simple user exception
-> exception entry
-> is kernel exception == false => bypass LBR handling
-> exception handling

- case 3b: simple kernel exception
-> exception entry
-> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
-> no reschedule, no PMI
-> exception handling
-> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

- case 3c: kernel exception with PMI
-> exception entry
-> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
-> PMI
can't touch LBR, as lbr_dump_state == EXCEPTION_OWNED
-> exception handling
-> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

- case 3d: kernel exception with a reschedule inside
-> exception entry
-> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
-> exception handling
-> context_switch()
-> perf cannot touch LBR, nobody can
-> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

I might be very wrong in the description as I'm not that sharp on x86, but is
there a flaw in the above cases ?

If not, a couple of tests and Thomas's per-cpu variable can solve the issue,
while keeping the exception handler code simple as Emmanual has proposed (given
the additionnal test inclusion - which will be designed to not pollute the LBR),
and having a small impact on perf to solve the resource acquire issue.

Cheers.

--
Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/