Re: [PATCH v2] riscv: errata: andes: Probe for IOCP only once in boot stage

From: Geert Uytterhoeven
Date: Thu Nov 30 2023 - 11:26:37 EST


Hi Prabhakar,

On Thu, Nov 30, 2023 at 5:23 PM Lad, Prabhakar
<prabhakar.csengg@xxxxxxxxx> wrote:
> On Thu, Nov 30, 2023 at 2:34 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Thu, Nov 30, 2023 at 1:56 PM Prabhakar <prabhakar.csengg@xxxxxxxxx> wrote:
> > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@xxxxxxxxxxxxxx>
> > >
> > > We need to probe for IOCP only once during boot stage, as we were probing
> > > for IOCP for all the stages this caused the below issue during module-init
> > > stage,
> > >
> > > [9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
> > > [9.027153] Oops [#1]
> > > [9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
> > > [9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ #57
> > > [9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > [9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
> > > [9.055558] ra : andes_errata_patch_func+0x4a/0x52
> > > [9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
> > > [9.067607] gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
> > > [9.074795] t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
> > > [9.081984] s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
> > > [9.089172] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
> > > [9.096360] a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
> > > [9.103548] s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
> > > [9.110736] s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
> > > [9.117924] s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
> > > [9.125113] s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
> > > [9.132301] t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
> > > [9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
> > > [9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
> > > [9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
> > > [9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
> > > [9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
> > > [9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
> > > [9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
> > > [9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
> > > [9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
> > > [9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
> > > [9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
> > > [9.205994] ---[ end trace 0000000000000000 ]---
> > >
> > > This is because we called riscv_noncoherent_supported() for all the stages
> > > during IOCP probe. riscv_noncoherent_supported() function sets
> > > noncoherent_supported variable to true which has an annotation set to
> > > "__ro_after_init" due to which we were seeing the above splat. Fix this by
> > > probing for IOCP only once in boot stage by having a boolean variable
> > > is_iocp_probe_done which will be set to true upon IOCP probe in
> > > errata_probe_iocp() and we bail out early if is_iocp_probe_done is set.
> > >
> > > While at it make return type of errata_probe_iocp() to void as we were
> > > not checking the return value in andes_errata_patch_func().
> > >
> > > Fixes: e021ae7f5145 ("riscv: errata: Add Andes alternative ports")
> > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@xxxxxxxxxxxxxx>
> > > ---
> > > v1->v2
> > > * As RISCV_ALTERNATIVES_BOOT stage can happen twice add a is_iocp_probe_done
> > > variable to probe for IOCP only once.
> > > * Updated commit message
> > > * Make return value of errata_probe_iocp() to void
> >
> > Thanks for the update!
> >
> > > --- a/arch/riscv/errata/andes/errata.c
> > > +++ b/arch/riscv/errata/andes/errata.c
> > > @@ -38,29 +38,36 @@ static long ax45mp_iocp_sw_workaround(void)
> > > return ret.error ? 0 : ret.value;
> > > }
> > >
> > > -static bool errata_probe_iocp(unsigned int stage, unsigned long arch_id, unsigned long impid)
> > > +static void errata_probe_iocp(unsigned int stage, unsigned long arch_id, unsigned long impid)
> > > {
> > > + static bool is_iocp_probe_done;
> >
> > done?
> >
> OK I'll rename it to "done".
>
> > > +
> > > if (!IS_ENABLED(CONFIG_ERRATA_ANDES_CMO))
> > > - return false;
> > > + return;
> > > +
> > > + if (is_iocp_probe_done)
> > > + return;
> > >
> >
> > Why not keep it simple, and just do
> >
> > done = true;
> >
> OK.
>
> > here?
> > Can arch_id or impid suddenly change, so you have to recheck?
> I only check arch_id and impid here. Are you suggesting I drop it?

No, I do not suggest to drop it.
I suggested moving the "done = true" up, so the check is done only once.

> > If the SBI call in ax45mp_iocp_sw_workaround() fails, is there really
> > a need to try it again later?
> >
> No if it fails we just continue with a broken system.
>
> Cheers,
> Prabhakar
> > > if (arch_id != ANDESTECH_AX45MP_MARCHID || impid != ANDESTECH_AX45MP_MIMPID)
> > > - return false;
> > > + return;
> > >
> > > - if (!ax45mp_iocp_sw_workaround())
> > > - return false;
> > > + if (!ax45mp_iocp_sw_workaround()) {
> > > + is_iocp_probe_done = true;
> > > + return;
> > > + }
> > >
> > > /* Set this just to make core cbo code happy */
> > > riscv_cbom_block_size = 1;
> > > riscv_noncoherent_supported();
> > > -
> > > - return true;
> > > + is_iocp_probe_done = true;
> > > }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds