Re: [PATCH v10 3/6] riscv: mm: dma-noncoherent: nonstandard cache operations support

From: Jisheng Zhang
Date: Mon Jul 31 2023 - 11:55:45 EST


On Mon, Jul 31, 2023 at 07:39:30AM +0200, Arnd Bergmann wrote:
> On Mon, Jul 31, 2023, at 02:49, Guo Ren wrote:
> > On Mon, Jul 31, 2023 at 4:36 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >>
> >> On Sun, Jul 30, 2023, at 17:42, Emil Renner Berthing wrote:
> >> > On Sun, 30 Jul 2023 at 17:11, Jisheng Zhang <jszhang@xxxxxxxxxx> wrote:
> >>
> >> >> > +
> >> >> > static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size)
> >> >> > {
> >> >> > void *vaddr = phys_to_virt(paddr);
> >> >> >
> >> >> > +#ifdef CONFIG_RISCV_NONSTANDARD_CACHE_OPS
> >> >> > + if (unlikely(noncoherent_cache_ops.wback)) {
> >> >>
> >> >> I'm worried about the performance impact here.
> >> >> For unified kernel Image reason, RISCV_NONSTANDARD_CACHE_OPS will be
> >> >> enabled by default, so standard CMO and T-HEAD's CMO platform's
> >> >> performance will be impacted, because even an unlikely is put
> >> >> here, the check action still needs to be done.
> >> >
> >> > On IRC I asked why not use a static key so the overhead is just a
> >> > single nop when the standard CMO ops are available, but the consensus
> >> > seemed to be that the flushing would completely dominate this branch.
> >> > And on platforms with the standard CMO ops the branch be correctly
> >> > predicted anyway.
> >>
> >> Not just the flushing, but also loading back the invalidated
> >> cache lines afterwards is just very expensive. I don't think
> >> you would be able to measure a difference between the static

I read this as: the cache clean/inv is so expensive that the static
key saving percentage is trivial, is this understanding right?

this could be measured by writing a small benchmark kernel module
which just calls cache clean/inv a buf(for example 1500Bytes)in a loop.

> >> key and a correctly predicted branch on any relevant usecase here.
> > Maybe we should move CMO & THEAD ops to the noncoherent_cache_ops, and
> > only keep one of them.
> >
> > I prefer noncoherent_cache_ops, it's more maintance than ALTERNATIVE.
>
> I think moving the THEAD ops at the same level as all nonstandard
> operations makes sense, but I'd still leave CMO as an explicit
> fast path that avoids the indirect branch. This seems like the right
> thing to do both for readability and for platforms on which the
> indirect branch has a noticeable overhead.
>
> Arnd