Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

From: Mark Rutland
Date: Tue May 09 2023 - 10:30:33 EST


On Sat, May 06, 2023 at 10:51:23AM +0800, Gang Li wrote:
> Hi,
>
> On 2023/4/28 17:27, Mark Rutland wrote:> The architecture allows a CPU to
> allocate TLB entries at any time for any
> > reason, for any valid translation table entries reachable from the root
> > in TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
> > reasons.
>
> TLB will be allocated due to prefetching or branch prediction. Will it
> be invalidated when the prediction fails?

No; once allocated they're allowed to remain until explicitly invalidated.

See below for more detail.

> > Due to that, it doesn't matter whether or not a CPU explicitly accesses a
> > memory location -- TLB entries can be allocated regardless.
> > Consequently, the
> > spinlock doesn't make any difference.
>
> And is there any kind of ARM manual or guide that explains these details to
> help us programming better?

There's no guide that I am aware of, but this is described in the ARM ARM. The
current relase (ARM DDI 0487J.a) can be found at:

https://developer.arm.com/documentation/ddi0487/ja

... and in future, the latest version should be available at:

https://developer.arm.com/documentation/ddi0487/latest

In the latest release (ARM DDI 0487J.a) relevant information can be found in
section D8 "The AArch64 Virtual Memory System Architecture", with key
information in D8.13 "Translation Lookaside Buffers" and D8.14 "TLB
maintenance".

For example, early in D8.13 we have the rule:

| R_SQBCS
|
| When address translation is enabled, a translation table entry for an
| in-context translation regime that does not cause a Translation fault, an
| Address size fault, or an Access flag fault is permitted to be cached in a
| TLB or intermediate TLB caching structure as the result of an explicit or
| speculative access.

Thanks,
Mark.