Re: [PATCH v2] mm, slub: Use prefetchw instead of prefetch

From: Hyeonggon Yoo
Date: Sat Oct 16 2021 - 07:38:45 EST


Andrew, can you please update the patch to v2?

On Mon, Oct 11, 2021 at 02:43:31PM +0000, Hyeonggon Yoo wrote:
> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
> slab_alloc()") introduced prefetch_freepointer() because when other cpu(s)
> freed objects into a page that current cpu owns, the freelist link is
> hot on cpu(s) which freed objects and possibly very cold on current cpu.
>
> But if freelist link chain is hot on cpu(s) which freed objects,
> it's better to invalidate that chain because they're not going to access
> again within a short time.
>
> So use prefetchw instead of prefetch. On supported architectures like x86
> and arm, it invalidates other copied instances of a cache line when
> prefetching it.
>
> Before:
>
> Time: 91.677
>
> Performance counter stats for 'hackbench -g 100 -l 10000':
> 1462938.07 msec cpu-clock # 15.908 CPUs utilized
> 18072550 context-switches # 12.354 K/sec
> 1018814 cpu-migrations # 696.416 /sec
> 104558 page-faults # 71.471 /sec
> 1580035699271 cycles # 1.080 GHz (54.51%)
> 2003670016013 instructions # 1.27 insn per cycle (54.31%)
> 5702204863 branch-misses (54.28%)
> 643368500985 cache-references # 439.778 M/sec (54.26%)
> 18475582235 cache-misses # 2.872 % of all cache refs (54.28%)
> 642206796636 L1-dcache-loads # 438.984 M/sec (46.87%)
> 18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%)
> 653842996501 dTLB-loads # 446.938 M/sec (46.63%)
> 3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%)
> 537531951350 iTLB-loads # 367.433 M/sec (54.33%)
> 114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%)
> 630135543177 L1-icache-loads # 430.733 M/sec (46.80%)
> 22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%)
>
> 91.964452802 seconds time elapsed
>
> 43.416742000 seconds user
> 1422.441123000 seconds sys
>
> After:
>
> Time: 90.220
>
> Performance counter stats for 'hackbench -g 100 -l 10000':
> 1437418.48 msec cpu-clock # 15.880 CPUs utilized
> 17694068 context-switches # 12.310 K/sec
> 958257 cpu-migrations # 666.651 /sec
> 100604 page-faults # 69.989 /sec
> 1583259429428 cycles # 1.101 GHz (54.57%)
> 2004002484935 instructions # 1.27 insn per cycle (54.37%)
> 5594202389 branch-misses (54.36%)
> 643113574524 cache-references # 447.409 M/sec (54.39%)
> 18233791870 cache-misses # 2.835 % of all cache refs (54.37%)
> 640205852062 L1-dcache-loads # 445.386 M/sec (46.75%)
> 17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%)
> 651747432274 dTLB-loads # 453.415 M/sec (46.59%)
> 3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%)
> 535395273064 iTLB-loads # 372.470 M/sec (54.38%)
> 113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%)
> 628871845924 L1-icache-loads # 437.501 M/sec (46.80%)
> 22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%)
>
> 90.514819303 seconds time elapsed
>
> 43.877656000 seconds user
> 1397.176001000 seconds sys
>
> Link: https://lkml.org/lkml/2021/10/8/598
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 3d2025f7163b..ce3d8b11215c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -354,7 +354,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
>
> static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> {
> - prefetch(object + s->offset);
> + prefetchw(object + s->offset);
> }
>
> static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> --
> 2.27.0
>