Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chainsupport to use NMI-safe methods

From: Ingo Molnar
Date: Mon Jun 15 2009 - 14:09:24 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Then, the NMI handler would be changed to always write that value
> to %cr2 after it has done the operation that could fault, and do
> an atomic increment of the NMI sequence count. Then, we can do
> something like this in the page fault handler:
>
> if (cr2 == MAGIC_CR2) {
> static unsigned long my_seqno = -1;
> if (my_seqno != nmi_seqno) {
> my_seqno = nmi_seqno;
> return;
> }
> }
>
> where the whole (and only) point of that "seqno" is to protect against
> user space doing something like
>
> int i = *(int *)MAGIC_CR2;
>
> and causing infinite faults.

Heh - this is so tricky that it's disgusting! Lovely.

And, since this appears to be a competition of sick ideas, an even
more disgusting hack might be to write to the IDT from the NMI
handler, and install a NULL entry at #PF and rely on the double
fault handler to detect faults - double faults dont clobber the cr2
i think ...

( I think to protect the fragile and pure fabric of lkml against
moral corruption, disgusting patches must remain unsent and
disgusting ideas like this must absolutely stay unspoken. Hence
i have removed lkml from the Cc:. [Oops i didnt ... too late,
and this mail has already been sent! :-/ ])

> If a real NMI happens, then nmi_seqno will always be different,
> and we'll just retry the fault (the NMI handler would do something
> like
>
> write_cr2(MAGIC_CR2);
> atomic_inc(&nmi_seqno);
>
> to set it all up).
>
> Anyway, I do think that the _correct_ solution is to not do page
> faults from within NMI's, but the above is an outline of how we
> could _try_ to handle it if we really really wanted to. IOW, the
> fact that cr2 gets corrupted is not insurmountable, exactly
> because we _could_ always just retrigger the page fault, and thus
> "re-create' the corrupted %cr2 value.
>
> Hacky, hacky. And I'm not sure how happy CPU's even are to have
> %cr2 written to, so we could hit CPU issues.

If cr2 cannot be safely written to on a CPU, that could be worked
around by counting the number of NMIs via a
percpu_add(this_nmi_count, 1) and retrying faults if any NMI
happened between the previous fault and this fault.

This has the disadvantage of potentially doubling the number of
pagefaults though. But it would certainly work as a tricky quirk to
this quirk which is added to a rather quirky code-path to begin
with.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/