Re: Two questions about cache coherency on arm platforms

From: Changbin Du
Date: Mon Mar 23 2020 - 12:15:46 EST


Hi Mark,
Thanks for your answer. I still don't understand the first question.

On Mon, Mar 23, 2020 at 01:17:20PM +0000, Mark Rutland wrote:
> On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> > Hi, All,
> > I am not very familiar with ARM processors. I have two questions about
> > cache coherency. Could anyone help me?
> >
> > 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> > As far as I know, big cores and little cores are in seperate clusters on
> > big.LITTLE system.
>
> This is often true, but not always the case. For example, with DSU big
> and little cores can be placed within the same cluster.
>
Yes, it is ture for DynamIQ that bl cores can be placed within the same cluster.
But I don't understand how linux support big.LITTLE before DynamIQ.

I read below description in ARM Cortex-A Series Programmerâs Guide for
ARMv8-A.
| big.LITTLE software models require transparent and efficient transfer of data between big and LITTLE clusters.
| Coherency between clusters is provided by a cache-coherent interconnect such as the ARM CoreLink CCI-400 described in Chapter 14.

So I think big cores and little cores are in different clusters in this
case. Then we are not within the same Inner Shareable domain?

> > And cache coherence betwwen clusters requires the
> > memory regions are marked as 'Outer Shareable' and is very expensive.
>
> This is not correct.
>
> Linux requires that all cores it uses are within the same Inner
> Shareable domain, regardless of whether they are in distinct clusters.
> Linux does not support systems where cores are in distinct Inner
> Shareable domains.
>
I see. Thanks.

> This is the intended use of the architecture. Per ARM DDI 0487E.a page
> B2-144:
>
> | This architecture assumes that all PEs that use the same operating
> | system or hypervisor are in the same Inner Shareable shareability
> | shareability
>
> ... where a PE is a "Processing Element", which you can think of as a
> single core.
>
> > I have checked the kernel code, and seems it only requires coherence in
> > 'Inner Shareable' domain. So my question is how can linux guarantees
> > cache coherence in 'CPU migration' or 'Global Task Scheduling' models
> > wich both clusters are active at the same time? For example, a thread
> > ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
> > to Cluster B.
>
> As above, this works because all the relevant cores are within the same
> Inner Shareable domain.
>
> > 2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
> > In linux kernel on arm64 platform, the flow function sync_icache_aliases()
> > is used to sync i-cache and d-cache. I understand the aliasing case. but
> > for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
> > whithout really invalidate the icache?
>
> The __flush_icache_range/__flush_cache_user_range assembly function does
> both the D-cache maintenance with DC CVAU, then the I-cache maintenance
> with IC IVAU, so I think you have misread it.
>a
Yes. I missed the IC IVAU instruction defined in macro
invalidate_icache_by_line.

> Thanks,
> Mark.
>
> > Will i-cache refill from L2 cache?
> >
> > void sync_icache_aliases(void *kaddr, unsigned long len)
> > {
> > unsigned long addr = (unsigned long)kaddr;
> >
> > if (icache_is_aliasing()) {
> > __clean_dcache_area_pou(kaddr, len);
> > __flush_icache_all();
> > } else {
> > /*
> > * Don't issue kick_all_cpus_sync() after I-cache invalidation
> > * for user mappings.
> > */
> > __flush_icache_range(addr, addr + len);
> > }
> > }
> >
> > --
> > Cheers,
> > Changbin Du

--
Cheers,
Changbin Du