Re: [PATCH] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO

From: Arnd Bergmann
Date: Mon Oct 02 2023 - 08:16:02 EST


On Wed, Sep 13, 2023, at 02:06, Guo Ren wrote:
> On Wed, Sep 13, 2023 at 3:00 AM Jessica Clarke <jrtc27@xxxxxxxxxx> wrote:
>>
>> On 12 Sep 2023, at 11:53, Guo Ren <guoren@xxxxxxxxxx> wrote:
>> > Please remove the thead_errata_cache_wback because T-HEAD processors
>> > would prioritize using an invalid cacheline instead of evicting an
>> > existing cacheline. When we do dcache clean, the following operations
>> > are to let other interconnect masters read. So, keeping wback_inv for
>> > T-HEAD processors is the best choice, and maybe some other processors'
>> > vendor has a different idea, but please use the wback_inv instead of
>> > wback_only for the T-HEAD processors.
>>
>> Unless you can demonstrate that your cores have significantly worse
>> performance when using wback instead of wback_inv I do not think the
>> non-standard implementation should deviate from the semantics of the
>> standard one. There are efforts to unify the implemented semantics of
>> the operations across architectures and this would obstruct those.
>
> I'm afraid I have to disagree with the view that this obstructs
> "unifying the implemented semantics of the operations across
> architectures."
>
> static const struct riscv_nonstd_cache_ops thead_errata_cmo_ops = {
> - .wback = &thead_errata_cache_wback,
> + .wback = &thead_errata_cache_wback_inv,
> .inv = &thead_errata_cache_inv,
> .wback_inv = &thead_errata_cache_wback_inv,
>
> I don't see how the above patch obstructs unifying. On the contrary,
> it decreases the custom function, which could help unify. Could you
> give the least respect for the vendor's choice?

Since the email thread popped up after the latest replies, I saw
that I had not replied yet. I agree with Jessica here: we need to
ensure that the callback functions do what the interface requires,
across all architectures and CPU implementations. The choice to
call wback or wback_inv is a matter of optimization in the caller,
and we may well end up calling .wback_inv() for cases that currently
rely on ,wback(), if we can show this to be faster on certain
CPUs, and other cases (e.g. repeatedly writing and flushing
individual bytes to single DMA buffer) clearly rely on keeping
the cache line.

Other policy questions like whether to use .wback or .inv before
a DMA from device also still need a better answer, which we must
decide globally rather than per CPU implementation.

Arnd