Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"

From: Leonid Yegoshin
Date: Fri Dec 05 2014 - 16:41:36 EST


Lars,

On 12/05/2014 01:32 AM, Lars Persson wrote:
Hi

Our setup includes both a non-DMA block device and a compressing
file-system (UBIFS). A flush_dcache_page() is issued by UBIFS so your
patch fixes another problem that we do not hit.

The stack trace is not available now. Do we need it for any further
analysis ? I think the mechanism of the race window is understood and it
depends on the __flush_dcache_page() deciding that the flush should be
postponed.

Unfortunately, the research of original case is still needed.
I looked into all cases of update_mmu_cache() besides HUGE page support and NUMA, and I see:

1. insert_pfn()
It is used to put a special page (read - VDSO) into memory map. No cache flush is needed here because page is done and flushed during system boot.

2. do_wp_page(), first occurrence
It has flush_cache_page() before it sets PTE in ptep_set_access_flags(). This flush is unconditional and affects all caches.

3. do_wp_page(), second case
It is done after preparing a clear new page or after COW. COW has an appropriate cache flush of destination in copy_user_highpage(). The immediate use of cleared new page as instruction (you had SIGILL, right?)... hm-m, something wrong in application in this case.

4. do_swap_page()
Well, it may be a case of flush_icache_page() is not used (see below) and page is taken from non-DMA swap. But I also recommend to look into

http://patchwork.linux-mips.org/patch/7615/

there is a bug in swap entry number presentation and it may affect your system.

5. do_anonymous_page()
The similar to case (3) - cleared new page, using of it as instruction page may point to some app problem.

6. do_set_pte()
It also has flush_icache_page() which may have impact if not implemented, see below.

7. handle_pte_fault()
Page is not touched and cache flush is NO-OP.

8. remove_migration_pte()
Well, it is a place for suspicion. But it should not run in parallel with any running thread - dirtying page while other thread is running is a way to disaster.

So, you see - if I understand it correctly, there is no place for failure... besides application misbehaviour or potential kernel bug in migration. Of course, I may miss something and that is a reason why stack trace is desirable.


I think the mechanism of the race window is understood and it
depends on the __flush_dcache_page() deciding that the flush should be
postponed.

As I remember, you said you use HIGHMEM patch, right? It changes a little __flush_dcache_page() and flush of any mapped page is not postponed anymore. So, it has an immediate effect for application pages.

- Leonid.



- Lars

On Fri, 2014-12-05 at 03:16 +0100, Leonid Yegoshin wrote:
(repeat mesg, first one went to wrong place)

Lars,

Do you have a stack trace or so then you found the second VPE between
set_pte_at and update_mmu_cache?
It would be interesting how it happens - generally, to get a consistent
SIGILL in applications due to misbehaviour of memory subsystem, the bug
in FS is not enough.

Hold on - do you use non-DMA file system?
If so, I advice you to try this simple patch:

Author: Leonid Yegoshin <yegoshin@xxxxxxxx>
Date: Tue Apr 2 14:20:37 2013 -0700

MIPS: (opt) Fix of reading I-pages from non-DMA FS devices for ID
cache separation

This optional fix provides a D-cache flush for instruction code
pages on
page faults. In case of non-DMA block device a driver doesn't know
that it
reads I-page and doesn't flush D-cache generally on systems without
cache aliasing. And that takes toll during page fault of
instruction pages.

It is not a perfect fix, it should be considered as a temporary fix.
The permanent fix would track page origin in page cache and flushes
D-cache
during reception of page from driver only but not at each page fault.
It is not done yet.

Change-Id: I43f5943d6ce0509729179615f6b81e77803a34ac
Author: Leonid Yegoshin <yegoshin@xxxxxxxx>
Signed-off-by: Leonid Yegoshin <yegoshin@xxxxxxxx>(imported from
commit 6ebd22eb7a3d9873582ebe990a77094f971652ee)(imported from commit
0caf3b4a1eebb64572e81e4df6fdb3abf12c70

arch/mips/include/asm/cacheflush.h:

@@ -61,6 +61,9 @@ static inline void flush_anon_page(struct
vm_area_struct *vma,
static inline void flush_icache_page(struct vm_area_struct *vma,
struct page *page)
{
+ if (cpu_has_dc_aliases ||
+ ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc))
+ __flush_dcache_page(page);
}

extern void (*flush_icache_range)(unsigned long start, unsigned
long end);


It fixed crash problems with non-DMA FS in a couple of our customers.
Without it the non-DMA root FS crashes are catastrophic in aliasing
systems but it is still a problem for I-cache too but much rare.

Unfortunately, it is also a performance hit, however is less than run a
page cache flush at each PTE setup. On 12/03/2014 06:03 AM, Lars Persson
wrote:
It is the flush_dcache_page() that was called from the file-system
reading the page contents into memory.

- Lars





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/