Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

From: Gang Li
Date: Mon May 15 2023 - 23:16:33 EST


Hi all!

On 2023/5/5 20:28, Gang Li wrote:
Hi,

I found that in `ghes_unmap` protected by spinlock, arm64 and x86 have
different strategies for flushing tlb.

# arm64 call trace:
```
holding a spin lock
ghes_unmap
 clear_fixmap
  __set_fixmap
   flush_tlb_kernel_range
```

# x86 call trace:
```
holding a spin lock
ghes_unmap
 clear_fixmap
  __set_fixmap
   mmu.set_fixmap
    native_set_fixmap
     __native_set_fixmap
      set_pte_vaddr
       set_pte_vaddr_p4d
        __set_pte_vaddr
         flush_tlb_one_kernel
```

arm64 broadcast TLB invalidation in ghes_unmap, because TLB entry can be
allocated regardless of whether the CPU explicitly accesses memory.

Why doesn't x86 broadcast TLB invalidation in ghes_unmap? Is there any
difference between x86 and arm64 in TLB allocation and invalidation strategy?


I found this in Intel® 64 and IA-32 Architectures Software Developer
Manuals:

4.10.2.3 Details of TLB Use
Subject to the limitations given in the previous paragraph, the
processor may cache a translation for any linear address, even if that
address is not used to access memory. For example, the processor may
cache translations required for prefetches and for accesses that result
from speculative execution that would never actually occur in the
executed code path.

Both x86 and arm64 can cache TLB for prefetches and speculative
execution. Then why are their flush policies different?

Thanks,
Gang Li