Re: [PATCH] ARM: vfp: Add vudot opcode to VFP undef hook

From: Mark-PK Tsai
Date: Thu Sep 21 2023 - 13:49:11 EST


> On 21/09/2023 3:13 am, Mark-PK Tsai wrote:
> >> On 2023-09-20 09:39, Mark-PK Tsai wrote:
> >>> Add vudot opcode to the VFP undef hook to fix the
> >>> potentially undefined instruction error when the
> >>> user space executes vudot instruction.
> >>
> >> Did the kernel expose a hwcap to say that the dot product extension is
> >> supported? I'm pretty sure it didn't, so why would userspace expect this
> >> to work? ;)
> >
> > The hwcap for dotprod has been exported since commit:
> >
> > 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
> >
> >>
> >> IIRC Amit was looking at defining the hwcaps to align with arm64 compat,
> >> but I believe that series faltered since most of them weren't actually
> >> needed (and I think at that point it was still missing the VFP support
> >> code parts). It would be nice if someone could pick up and combine both
> >
> > Were the mentioned series related to this commit?
> >
> > 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
>
> Oh, that did get merged? My apologies, I grepped for the hwcaps in
> arch/arm but somehow failed to spot that some definitions did exist, so
> assumed it hadn't been; not sure what went wrong there :(
>
> In that case, we definitely want this tagged as a fix, and to make sure
> we double-check for any equivalent fixes still needed for the other
> features too. Sorry again for the confusion.


I have just checked the latest open document for the aarch32 ISA:

https://developer.arm.com/documentation/ddi0597/2023-06/?lang=en

It seems that we may need 3 entries to handle all of the
instructions under the "Unconditional Advanced SIMD and floating-point instructions".

Going from the top-level to "Unconditional Advanced SIMD and floating-point instructions"
would be as follows:

L1: b'xxxx_11xx_____xxxx_xxxx_xxxx_xxxx_xxxx_xxxx -> System register access, Advanced SIMD, floating-point and Supervisor call
L2: b'1111_11(!=11)_xxxx_xxxx_xxxx_1xxx_xxxx_xxxx -> Unconditional Advanced SIMD and floating-point instructions

The code would be like:

.instr_mask = 0xff000800
.instr_val = 0xfc000800

.instr_mask = 0xff000800
.instr_val = 0xfd000800

.instr_mask = 0xff000800
.instr_val = 0xfe000800

I would appreciate any suggestions you may have regarding this approach.
Thanks!

>
> >> efforts and get this done properly; fill in *all* the hwcaps and
> >> relevant handling for extensions which Cortex-A55 supports (since
> >> there's definitely more than just VUDOT), and then hopefully we're done
> >> for good.
> >
> > Agree.
> >
> >>
> >>> Before this commit, kernel didn't handle the undef exception
> >>> caused by vudot and didn't enable VFP in lazy VFP context
> >>> switch code like other NEON instructions.
> >>> This led to the occurrence of the undefined instruction
> >>> error as following:
> >>>
> >>> [ 250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
> >>> ...
> >>> [ 250.741287 ] PC is at 0x4014ec
> >>> [ 250.741298 ] LR is at 0xb677874f
> >>> [ 250.741303 ] pc : [<004014ec>] lr : [<b677874f>] psr: 80070010
> >>> [ 250.741309 ] sp : beffedb0 ip : b67d7864 fp : beffee58
> >>> [ 250.741314 ] r10: 00000000 r9 : 00000000 r8 : 00000000
> >>> [ 250.741319 ] r7 : 00000001 r6 : 00000001 r5 : beffee90 r4 : 00401470
> >>> [ 250.741324 ] r3 : beffee20 r2 : beffee30 r1 : beffee40 r0 : 004003a8
> >>> [ 250.741331 ] Flags: Nzcv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
> >>> [ 250.741339 ] Control: 10c5383d Table: 32d0406a DAC: 00000055
> >>> [ 250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
> >>>
> >>> Below is the assembly of the user program:
> >>>
> >>> 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
> >>> 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
> >>> 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
> >>> 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
> >>> 0x4014ec <+124>: vudot.u8 q8, q9, q10 <-- switch in, and FPEXC.EN = 0
> >>> SIGILL(illegal instruction)
> >>>
> >>> Link: https://services.arm.com/support/s/case/5004L00000XsOjP
> >>
> >> Linking to your private support case is not useful to upstream. Even I
> >> can't open that link.
> >
> > I thought that maybe someone in arm need this.
> > But it seems a bit noisy so I will remove the link from v2.
>
> Yeah, even within Arm most of us don't have permission to access the
> support system.
>
> Cheers,
> Robin.
>
> >>
> >>> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@xxxxxxxxxxxx>
> >>> ---
> >>> arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
> >>> 1 file changed, 12 insertions(+)
> >>>
> >>> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> >>> index 7e8773a2d99d..7eab8d1019d2 100644
> >>> --- a/arch/arm/vfp/vfpmodule.c
> >>> +++ b/arch/arm/vfp/vfpmodule.c
> >>> @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
> >>> .cpsr_mask= PSR_T_BIT,
> >>> .cpsr_val= 0,
> >>> .fn= vfp_support_entry,
> >>> +}, {
> >>> +.instr_mask= 0xffb00000,
> >>> +.instr_val= 0xfc200000,
> >>> +.cpsr_mask= PSR_T_BIT,
> >>> +.cpsr_val= 0,
> >>> +.fn= vfp_support_entry,
> >>> }, {
> >>> .instr_mask= 0xef000000,
> >>> .instr_val= 0xef000000,
> >>> @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
> >>> .cpsr_mask= PSR_T_BIT,
> >>> .cpsr_val= PSR_T_BIT,
> >>> .fn= vfp_support_entry,
> >>> +}, {
> >>> +.instr_mask= 0xffb00000,
> >>> +.instr_val= 0xfc200000,
> >>> +.cpsr_mask= PSR_T_BIT,
> >>> +.cpsr_val= PSR_T_BIT,
> >>> +.fn= vfp_support_entry,
> >>
> >> Why have two entries conditional on each possible value of one bit for
> >> otherwise identical encodings? Surely it suffices to set both cpsr_mask
> >> and cpsr_val to 0?
> >
> > You're right.
> > I will set both cpsr_mask and cpsr_val to 0 and use single entry,
> > as you suggested, in the v2 patch.
> >
> > Thanks.
> >
> >>
> >> Thanks,
> >> Robin.
> >>
> >>> }};
> >>>
> >>> static struct undef_hook vfp_support_hook = {