Re: [PATCH, v2] mm, numa: Turn 4K pte NUMA faults into effectivehugepage ones

From: David Rientjes
Date: Tue Nov 20 2012 - 21:41:06 EST


On Tue, 20 Nov 2012, Ingo Molnar wrote:

> Reduce the 4K page fault count by looking around and processing
> nearby pages if possible.
>
> To keep the logic and cache overhead simple and straightforward
> we do a couple of simplifications:
>
> - we only scan in the HPAGE_SIZE range of the faulting address
> - we only go as far as the vma allows us
>
> Also simplify the do_numa_page() flow while at it and fix the
> previous double faulting we incurred due to not properly fixing
> up freshly migrated ptes.
>
> Suggested-by: Mel Gorman <mgorman@xxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

Acked-by: David Rientjes <rientjes@xxxxxxxxxx>

Ok, this is significantly better, it almost cut the regression in half on
my system. With THP enabled:

numa/core at ec05a2311c35: 136918.34 SPECjbb2005 bops
numa/core at 01aa90068b12: 128315.19 SPECjbb2005 bops (-6.3%)
numa/core at 01aa90068b12 + patch: 132523.06 SPECjbb2005 bops (-3.2%)

Here's the newest perftop, which is radically different than before (not
nearly the number of newly-added numa/core functions in the biggest
consumers) but still incurs significant overhead from page faults.

92.18% perf-6697.map [.] 0x00007fe2c5afd079
1.20% libjvm.so [.] instanceKlass::oop_push_contents(PSPromotionManag
1.05% libjvm.so [.] PSPromotionManager::drain_stacks_depth(bool)
0.78% libjvm.so [.] PSPromotionManager::copy_to_survivor_space(oopDes
0.59% libjvm.so [.] PSPromotionManager::claim_or_forward_internal_dep
0.49% [kernel] [k] page_fault
0.27% libjvm.so [.] Copy::pd_disjoint_words(HeapWord*, HeapWord*, unsigned lo
0.27% libc-2.3.6.so [.] __gettimeofday
0.19% libjvm.so [.] CardTableExtension::scavenge_contents_parallel(ObjectStar
0.16% [kernel] [k] getnstimeofday
0.14% [kernel] [k] _raw_spin_lock
0.13% [kernel] [k] generic_smp_call_function_interrupt
0.11% [kernel] [k] ktime_get
0.11% [kernel] [k] rcu_check_callbacks
0.10% [kernel] [k] read_tsc
0.09% libjvm.so [.] os::javaTimeMillis()
0.09% [kernel] [k] clear_page_c
0.08% [kernel] [k] flush_tlb_func
0.08% [kernel] [k] ktime_get_update_offsets
0.07% [kernel] [k] task_tick_fair
0.06% [kernel] [k] emulate_vsyscall
0.06% libjvm.so [.] oopDesc::size_given_klass(Klass*)
0.06% [kernel] [k] __do_page_fault
0.04% [kernel] [k] __bad_area_nosemaphore
0.04% perf [.] 0x000000000003310b
0.04% libjvm.so [.] objArrayKlass::oop_push_contents(PSPromotionManager*, oop
0.04% [kernel] [k] run_timer_softirq
0.04% [kernel] [k] copy_user_generic_string
0.03% [kernel] [k] task_numa_fault
0.03% [kernel] [k] smp_call_function_many
0.03% [kernel] [k] retint_swapgs
0.03% [kernel] [k] update_cfs_shares
0.03% [kernel] [k] error_sti
0.03% [kernel] [k] _raw_spin_lock_irq
0.03% [kernel] [k] update_curr
0.02% [kernel] [k] write_ok_or_segv
0.02% [kernel] [k] call_function_interrupt
0.02% [kernel] [k] __do_softirq
0.02% [kernel] [k] acct_update_integrals
0.02% [kernel] [k] x86_pmu_disable_all
0.02% [kernel] [k] apic_timer_interrupt
0.02% [kernel] [k] tick_sched_timer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/