Re: Re: Re: [PATCH V2 1/3] riscv: Add Zicbop instruction definitions & cpufeature

From: Andrew Jones
Date: Wed Jan 03 2024 - 14:44:58 EST


On Wed, Jan 03, 2024 at 07:49:44AM +0100, Andrew Jones wrote:
> On Wed, Jan 03, 2024 at 02:13:00PM +0800, Guo Ren wrote:
> > On Tue, Jan 2, 2024 at 6:32 PM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Sun, Dec 31, 2023 at 03:29:51AM -0500, guoren@xxxxxxxxxx wrote:
...
> > > > #define HFENCE_VVMA(vaddr, asid) \
> > > > @@ -196,4 +244,16 @@
> > > > INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0), \
> > > > RS1(base), SIMM12(4))
> > > >
> > > > +#define CBO_PREFETCH_I(base, offset) \
> > > > + INSN_S(OPCODE_OP_IMM, FUNC3(6), __RS2(0), \
> > > > + SIMM12(offset), RS1(base))
> > > > +
> > > > +#define CBO_PREFETCH_R(base, offset) \
> > > > + INSN_S(OPCODE_OP_IMM, FUNC3(6), __RS2(1), \
> > > > + SIMM12(offset), RS1(base))
> > > > +
> > > > +#define CBO_PREFETCH_W(base, offset) \
> > > > + INSN_S(OPCODE_OP_IMM, FUNC3(6), __RS2(3), \
> > > > + SIMM12(offset), RS1(base))
> > >
> > > Shouldn't we ensure the lower 5-bits of offset are zero by masking it?
> > The spec says:
> > "These instructions operate on the cache block whose effective address
> > is the sum of the base address specified in rs1 and the sign-extended
> > offset encoded in imm[11:0], where imm[4:0] shall equal 0b00000. The
> > effective address is translated into a corresponding physical address
> > by the appropriate translation mechanisms."
> >
> > So, the user of prefetch.w should keep imm[4:0] zero.
>
> Yes, the user _should_ keep imm[4:0] zero. Unless we can validate at
> compile time that all users have passed offsets with the lower 5-bits
> set to zero, then I think we should mask them here, since I'd rather
> not provide the user a footgun.
>
> > Just like the
> > patch has done, the whole imm[11:0] is zero.
>
> That's just one possible use, and I think exposing the offset operand to
> users makes sense for unrolled sequences of invocations, so I wouldn't
> count on offset always being zero.
>

Another thought on this line is that a base which isn't block size aligned
may not "work". The spec says

"""
...instruction indicates to hardware that the cache block whose effective
address is the sum of the base address specified in rs1 and the
sign-extended offset encoded in imm[11:0], where imm[4:0] equals
0b00000, is likely to be accessed...
"""

which implies we need an effective address which maps to a cache block.
However, unlike having a nonzero imm[4:0], I don't fear a problem with the
instruction if 'base' isn't block sized aligned, but the instruction might
not do anything.

I think we need to add DT parsing of riscv,cbop-block-size and then
use it to mask the base address in the callers of these macros. (But
that doesn't mean I don't think we still need to mask offset here.)

Thanks,
drew