Re: [PATCH 2/3] x86: mm: Change tlb_flushall_shift for IvyBridge

From: Ingo Molnar
Date: Thu Dec 12 2013 - 08:13:22 EST



* Mel Gorman <mgorman@xxxxxxx> wrote:

> There was a large performance regression that was bisected to commit 611ae8e3
> (x86/tlb: enable tlb flush range support for x86). This patch simply changes
> the default balance point between a local and global flush for IvyBridge.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> ---
> arch/x86/kernel/cpu/intel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index dc1ec0d..2d93753 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -627,7 +627,7 @@ static void intel_tlb_flushall_shift_set(struct cpuinfo_x86 *c)
> tlb_flushall_shift = 5;
> break;
> case 0x63a: /* Ivybridge */
> - tlb_flushall_shift = 1;
> + tlb_flushall_shift = 2;
> break;

I'd not be surprised if other CPU models showed similar weaknesses
under ebizzy as well.

I don't particularly like the tuning aspect of the whole feature: the
tunings are model specific and they seem to come out of thin air,
without explicit measurements visible.

In particular the first commit that added this optimization:

commit c4211f42d3e66875298a5e26a75109878c80f15b
Date: Thu Jun 28 09:02:19 2012 +0800

x86/tlb: add tlb_flushall_shift for specific CPU

already had these magic tunings, with no explanation about what kind
of measurement was done to back up those tunings.

I don't think this is acceptable and until this is cleared up I think
we might be better off turning off this feature altogether, or making
a constant, very low tuning point.

The original code came via:

611ae8e3f520 x86/tlb: enable tlb flush range support for x86

which references a couple of benchmarks, in particular a
micro-benchmark:

My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
show that the random memory access on other CPU has 0~50% speed up
on a 2P * 4cores * HT NHM EP while do 'munmap'.

if the tunings were done with the micro-benchmark then I think they
are bogus, because AFAICS it does not measure the adversarial case of
the optimization.

So I'd say at minimum we need to remove the per model tunings, and
need to use very conservative defaults, to make sure we don't slow
down reasonable workloads.

( In theory madvise() could give us information about the usage
pattern of the vma - but in practice madvise() is rarely used and I
doubt ebizzy or other real-world apps are using it, so it's
meaningless. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/