Hi,
Just to check -- did you mean to drop the other Ccs? It would be good to keep
this discussion on-list if possible.
On Fri, Apr 28, 2023 at 01:49:46PM +0800, Gang Li wrote:
On 2023/4/27 15:30, Mark Rutland wrote:
On Thu, Apr 27, 2023 at 11:26:50AM +0800, Gang Li wrote:
1. I am curious to know the reason behind the design choice of flushing
the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes
the TLB on a single core. Are there any TLB design details that make a
difference here?
I don't know why arm64 only clears this on a single CPU.
Sorry, I'm a bit confused.
Did you mean you don't know why *amd64* only clears this on a single
CPU?
Yes, sorry; I meant to say "amd64" rather than "arm64" here.
Looks like I should ask amd64 guy 😉
😉
On arm64 we *must* invalidate the TLB on all CPUs as the kernel page tables are
shared by all CPUs, and the architectural Break-Before-Make rules in require
the TLB to be invalidated between two valid (but distinct) entries.
ghes_unmap is protected by a spin_lock, so only one core can access this
mem area at a time. I understand that there will be no TLB for
this memory area on other cores.
Is it because arm64 has speculative execution? Even if the core does not
hold the spin_lock, the TLB will still cache the critical section?
The architecture allows a CPU to allocate TLB entries at any time for any
reason, for any valid translation table entries reachable from the root in
TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
reasons.
Due to that, it doesn't matter whether or not a CPU explicitly accesses a
memory location -- TLB entries can be allocated regardless. Consequently, the
spinlock doesn't make any difference.
Thanks,
Mark.