Re: [PATCH RT v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

From: Mike Galbraith
Date: Wed Jul 18 2018 - 06:31:17 EST


See pseudo-patch below. That cures the reported gcc gripeage.

On Sun, 2018-07-15 at 09:22 +0200, Mike Galbraith wrote:
> On Sat, 2018-07-14 at 00:03 +0200, Mike Galbraith wrote:
> > On Fri, 2018-07-13 at 19:49 +0200, Sebastian Andrzej Siewior wrote:
> > > In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> > > code disables BH and expects that it is not preemptible. On -RT the
> > > task remains preemptible but remains the same CPU. This may corrupt the
> > > content of the SIMD registers if the task is preempted during
> > > saving/restoring those registers.
> > > Add a locallock around this process. This avoids that the any function
> > > within the locallock block is invoked more than once on the same CPU.
> > >
> > > The kernel_neon_begin() can't be kept preemptible. If the task-switch notices
> > > TIF_FOREIGN_FPSTATE then it would restore task's SIMD state and we lose the
> > > state of registers used for in-kernel-work. We would require additional storage
> > > for the in-kernel copy of the registers. But then the NEON-crypto checks for
> > > the need-resched flag so it shouldn't that bad.
> > > The preempt_disable() avoids the context switch while the kernel uses the SIMD
> > > registers. Unfortunately we have to balance out the migrate_disable() counter
> > > because local_lock_bh() is invoked in different context compared to its unlock
> > > counterpart.
> > >
> > > __efi_fpsimd_begin() should not use kernel_fpu_begin() due to its
> > > preempt_disable() context and instead save the registers always in its
> > > extra spot on RT.
> > >
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> > > ---
> > >
> > > This seems to make work (crypto chacha20-neon + cyclictest). I have no
> > > EFI so I have no clue if saving SIMD while calling to EFI works.
> >
> > All is not well on cavium test box. I'm seeing random errors ala...
> >
> > ./include/linux/fs.h:3137:11: internal compiler error: Segmentation fault
> > ./include/linux/bio.h:175:1: internal compiler error: in grokdeclarator, at c/c-decl.c:7023
> >
> > ...during make -j96 (2*cpus) kbuild. Turns out 4.14-rt has this issue
> > as well, which is unsurprising if it's related to fpsimd woes. Box
> > does not exhibit the issue with NONRT kernels, PREEMPT or NOPREEMPT.
>
> Verified to be SIMD woes. I backported your V2 to 4.14-rt, and the
> CPUS*2 kbuild still reliably reproduced the corruption issue. I then
> did the below to both 4.14-rt and 4.16-rt, and the corruption is gone.
>
> (this looks a bit like a patch, but is actually a functional yellow
> sticky should I need to come back for another poke at it later)
>
> arm64: fpsimd: disable preemption for RT where that is assumed
>
> 1. Per Sebastian's analysis, kernel_neon_begin() can't be made preemptible:
> If the task-switch notices TIF_FOREIGN_FPSTATE then it would restore task's
> SIMD state and we lose the state of registers used for in-kernel-work. We
> would require additional storage for the in-kernel copy of the registers.
> But then the NEON-crypto checks for the need-resched flag so it shouldn't
> that bad.
>
> 2. arch_efi_call_virt_setup/teardown() encapsulate __efi_fpsimd_begin/end()
> in preempt disabled sections via efi_virtmap_load/unload(). That could be
> fixed, but...
>
> 3. A local lock solution which left preempt disabled sections 1 & 2 intact
> failed, CPUS*2 parallel kbuild reliably reproduced memory corruption.
>
> Given the two non-preemptible sections which could encapsulate something
> painful remained intact with the local lock solution, and the fact that
> the remaining BH disabled sections are all small, with empirical evidence
> at hand that at LEAST one truely does require preemption be disabled,
> the best solution for both RT and !RT is to simply disable preemption for
> RT where !RT assumes preemption has been disabled. That adds no cycles
> to the !RT case, fewer cycles to the RT case, and requires no (ugly) work
> around for the consequences of local_unlock() under preempt_disable().
>
> Signed-off-by: Mike Galbraith <efault@xxxxxx>
> ---
> arch/arm64/kernel/fpsimd.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -594,6 +594,7 @@ int sve_set_vector_length(struct task_st
> * non-SVE thread.
> */
> if (task == current) {
> + preempt_disable_rt();
> local_bh_disable();
>
> task_fpsimd_save();
> @@ -604,8 +605,10 @@ int sve_set_vector_length(struct task_st
> if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> sve_to_fpsimd(task);
>
> - if (task == current)
> + if (task == current) {
> local_bh_enable();
> + preempt_enable_rt();
> + }
>
> /*
> * Force reallocation of task SVE state to the correct size
> @@ -837,6 +840,7 @@ asmlinkage void do_sve_acc(unsigned int
>
> sve_alloc(current);
>
> + preempt_disable_rt();
> local_bh_disable();
>
> task_fpsimd_save();
> @@ -850,6 +854,7 @@ asmlinkage void do_sve_acc(unsigned int
> WARN_ON(1); /* SVE access shouldn't have trapped */
>
> local_bh_enable();
> + preempt_enable_rt();
> }
>
> /*
> @@ -925,6 +930,7 @@ void fpsimd_flush_thread(void)
> if (!system_supports_fpsimd())
> return;
>
> + preempt_disable_rt();
> local_bh_disable();
>
> memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
> @@ -968,6 +974,7 @@ void fpsimd_flush_thread(void)
> set_thread_flag(TIF_FOREIGN_FPSTATE);
>
> local_bh_enable();
> + preempt_enable_rt();
> }
>
> /*
> @@ -979,9 +986,11 @@ void fpsimd_preserve_current_state(void)
> if (!system_supports_fpsimd())
> return;
>
> + preempt_disable_rt();
> local_bh_disable();
> task_fpsimd_save();
> local_bh_enable();
> + preempt_enable_rt();
> }
>
> /*
> @@ -1021,6 +1030,7 @@ void fpsimd_restore_current_state(void)
> if (!system_supports_fpsimd())
> return;
>
> + preempt_disable_rt();
> local_bh_disable();
>
> if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -1029,6 +1039,7 @@ void fpsimd_restore_current_state(void)
> }
>
> local_bh_enable();
> + preempt_enable_rt();
> }
>
> /*
> @@ -1041,6 +1052,7 @@ void fpsimd_update_current_state(struct
> if (!system_supports_fpsimd())
> return;
>
> + preempt_disable_rt();
> local_bh_disable();
>
> current->thread.fpsimd_state.user_fpsimd = *state;
> @@ -1053,6 +1065,7 @@ void fpsimd_update_current_state(struct
> fpsimd_bind_to_cpu();
>
> local_bh_enable();
> + preempt_enable_rt();
> }
>
> /*
> @@ -1115,6 +1128,7 @@ void kernel_neon_begin(void)
>
> BUG_ON(!may_use_simd());
>
> + preempt_disable();
> local_bh_disable();
>
> __this_cpu_write(kernel_neon_busy, true);
> @@ -1128,8 +1142,6 @@ void kernel_neon_begin(void)
> /* Invalidate any task state remaining in the fpsimd regs: */
> fpsimd_flush_cpu_state();
>
> - preempt_disable();
> -
> local_bh_enable();
> }
> EXPORT_SYMBOL(kernel_neon_begin);