Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"

From: Lars Persson
Date: Wed Dec 03 2014 - 04:40:49 EST


Hi Leonid

First let me describe the mechanism of this race condition, which was a
fault in the kernel's MIPS architecture code. Specifically in its
implementation of lazy dcache flushing. AFAIK, it would only hit on
systems where the pagein code path writes to the page from the CPU.

The order of calls is:
flush_dcache_page() (from the FS's readpage)
set_pte_at()
update_mmu_cache()

The thread number one has executed the set_pte_at() when thread number
two hits the same page. It finds a valid PTE and proceeds to execute
code from a page that is not yet flushed to the point of I/D coherency.
That flush would happen in update_mmu_cache().

My patch does increase number of cache flushes for CoW yes and there
could be an optimization opportunity by playing tricks with the pte_t to
include information about executability of the mapping.

Reverting the patch is a big no-no, then we go back to a state of
undefined CPU behavior.

- Lars


On Wed, 2014-12-03 at 04:25 +0100, Leonid Yegoshin wrote:
> This reverts commit 64f23ab30b1fec86b91a48bf1f088840e5299971
> (commit 2a4a8b1e5d9d343e13ff22e19af7b353f7b52d6f in Linux 'master')
>
> Unfortunately the original commit effectively kills Imagination MIPS CPUs
> performance because it enforces page cache flush each time then PTE is changed
> for our CPUs. Basically - each CoW, each process start, each library attachment
> or detachment. And it happens even in "fully-coherent" systems (in MIPS sense -
> we have non coherent I-cache). Last tendency for increasing page size to 16K-64K
> even exaggerate the performance loss.
>
> Original commit discussion says:
>
> Kernel call stacks showed one thread handling an illegal instruction
> exception while another thread was somewhere around the
> set_pte_at/update_mmu_cache calls for the same page.
>
> If this is taken correct then it points to a major problem unrelated to MIPS -
> if by chance a first thread hits the page just before set_pte_at then it should
> get a non-present PTE: set_pte_at with _PAGE_PRESENT/_PAGE_VALID flags can be
> used only on "disactivated" PTE, after pte_clear* functions, effectively -
> non-present, non-valid. And application can get SIGSEGV. It is a major race
> condition and kernel should not offer it to applications. And in my
> understanding set_pte_at is used in cases then page is available after kernel
> finishes page handling, at least in theory.
>
> Test note: I ran MIPS 1004K with 8 VPEs on 4 cores around 1.5 month on SOAK test
> and never seen that problem on 3.10 kernel.
>
> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
>
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/