Re: [PATCH] LoongArch: Fix irq enable in exception handlers

From: Jinyang He
Date: Tue Jan 03 2023 - 03:42:52 EST



On 2023-01-03 12:54, Huacai Chen wrote:
On Fri, Dec 30, 2022 at 1:58 PM Jinyang He <hejinyang@xxxxxxxxxxx> wrote:

On 2022-12-29 14:54, Qi Hu wrote:
On 2022/12/29 14:13, Jinyang He wrote:
On 2022-12-29 00:51, Qi Hu wrote:

On 2022/12/27 18:10, Jinyang He wrote:
On 2022-12-27 17:52, Huacai Chen wrote:

On Tue, Dec 27, 2022 at 5:30 PM Jinyang He <hejinyang@xxxxxxxxxxx>
wrote:
On 2022-12-27 15:37, Huacai Chen wrote:
Hi, Jinyang,

Move die_if_kernel to irq disabled context to solve what?
For more strict logical. If the code flow go to die in
die_if_kernel(),
its interrupt state is enable, that means it may cause schedule.
So I think it is better to call die_if_kernel() firstly.
die_if_kernel is called with irq enabled in old kernels for several
years, and has no problems.

I think because it never call die() in die_if_kernel(). What I do
emphasize is that there needs to be more strict logic here than
it worked well in the past. I bet if die_if_kernel() was removed,
it will still work well in the future.


And LBT is
surely allowed to be triggered in kernel context.
I'm not familar with lbt, I just not see any lbt codes in kernel.
Plz,
how lbt exception triggered, and how kernel trigger lbt exception?
You can ask Huqi for more details, and this was discussed publicly
last week.
To: Qi Hu


Hi,


We really need some help. Can you give us some ideas?


Thanks,

Jinyang

Huacai is correct. The LBT disable exception (BTD) can be triggered
in kernel context.

If the CSR.ENEU.BTE == 0 [^1], the LBT instructions (these [^2] will
be used in the kernel) will trigger the exception.

Unfortunately, when you want to do some fpu_{save, restore}, you
need to use some LBT instructions [^3] [^4]. So if FPD is triggered,
LBT might still not be enabled, and the 'do_lbt' will be called in
the kernel context.

Hope the information can help. Thanks.


[1]
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#extended-component-unit-enable

[2]
https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R140-R184

[3]
https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R218-R230

[4]
https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R236-R263


Hi,


That's helpful. Thanks!


But I still wonder if SXD or ASXD have the same possibility of being
triggered in the kernel mode by sc_save_{lsx, lasx} or other place.
Do we need remove these die_if_kernel codes in do_lasx() and do_lsx()?


Jinyang

Hi Jinyang,

I think only BTD has this tricky situation, because there is some
overlap between FPD/SXD/ASXD and BTD.

So, in my view, SXD or ASXD will not be triggered in kernel mode.

Thanks.


Qi
Got it. Thanks for your help. And I'll fix my patch in next version.
In my opinion only the do_bp() modification is useful, and that part
can be squashed to Tiezhu's kprobe patches.

Yes, I have to admit that only the modification of do_bp() is useful,
in fact other modification are not triggered, I think. Most do_xxx is
irq enabled or in user mode before triggered. Although I can give a test
in Qemu that make do_ri() triggered in irq disable state, and then it
will hang if unconditionally call local_irq_enable, I know it makes no sense
because these codes can not be triggered currently, just like this BUG
cannot be found only before Tiezhu supporting kprobe on LoongArch.

If leaving potentially illogical codes is allowed, squash it to Tiezhu's
kprobe patches.

Jinyang


Huacai

Jinyang


Qi

Huacai
Thanks,

Jinyang


Huacai

On Wed, Dec 21, 2022 at 3:43 PM Jinyang He
<hejinyang@xxxxxxxxxxx> wrote:
The interrupt state can be got by regs->csr_prmd. Once previous
interrupt state is disable, we shouldn't enable interrupt if we
triggered exception which can be triggered in kernel mode. So
conditionally enable interrupt. For those do_\exception which
can not triggered in kernel mode but need enable interrupt, call
die_if_kernel() firstly. And for do_lsx, do_lasx and do_lbt cannot
triggered in kernel mode, too.

Signed-off-by: Jinyang He <hejinyang@xxxxxxxxxxx>
---
arch/loongarch/kernel/traps.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/loongarch/kernel/traps.c
b/arch/loongarch/kernel/traps.c
index 1ea14f6c18d3..3ac7b32d1e15 100644
--- a/arch/loongarch/kernel/traps.c
+++ b/arch/loongarch/kernel/traps.c
@@ -340,9 +340,9 @@ asmlinkage void noinstr do_fpe(struct
pt_regs *regs, unsigned long fcsr)

/* Clear FCSR.Cause before enabling interrupts */
write_fcsr(LOONGARCH_FCSR0, fcsr & ~mask_fcsr_x(fcsr));
- local_irq_enable();

die_if_kernel("FP exception in kernel code", regs);
+ local_irq_enable();

sig = SIGFPE;
fault_addr = (void __user *) regs->csr_era;
@@ -432,7 +432,8 @@ asmlinkage void noinstr do_bp(struct
pt_regs *regs)
unsigned long era = exception_era(regs);
irqentry_state_t state = irqentry_enter(regs);

- local_irq_enable();
+ if (regs->csr_prmd & CSR_PRMD_PIE)
+ local_irq_enable();
current->thread.trap_nr = read_csr_excode();
if (__get_inst(&opcode, (u32 *)era, user))
goto out_sigsegv;
@@ -514,7 +515,8 @@ asmlinkage void noinstr do_ri(struct
pt_regs *regs)
unsigned int __user *era = (unsigned int __user
*)exception_era(regs);
irqentry_state_t state = irqentry_enter(regs);

- local_irq_enable();
+ if (regs->csr_prmd & CSR_PRMD_PIE)
+ local_irq_enable();
current->thread.trap_nr = read_csr_excode();

if (notify_die(DIE_RI, "RI Fault", regs, 0,
current->thread.trap_nr,
@@ -606,8 +608,8 @@ asmlinkage void noinstr do_fpu(struct
pt_regs *regs)
{
irqentry_state_t state = irqentry_enter(regs);

- local_irq_enable();
die_if_kernel("do_fpu invoked from kernel context!",
regs);
+ local_irq_enable();
BUG_ON(is_lsx_enabled());
BUG_ON(is_lasx_enabled());

@@ -623,13 +625,13 @@ asmlinkage void noinstr do_lsx(struct
pt_regs *regs)
{
irqentry_state_t state = irqentry_enter(regs);

+ die_if_kernel("do_lsx invoked from kernel context!",
regs);
local_irq_enable();
if (!cpu_has_lsx) {
force_sig(SIGILL);
goto out;
}

- die_if_kernel("do_lsx invoked from kernel context!",
regs);
BUG_ON(is_lasx_enabled());

preempt_disable();
@@ -645,14 +647,13 @@ asmlinkage void noinstr do_lasx(struct
pt_regs *regs)
{
irqentry_state_t state = irqentry_enter(regs);

+ die_if_kernel("do_lasx invoked from kernel context!",
regs);
local_irq_enable();
if (!cpu_has_lasx) {
force_sig(SIGILL);
goto out;
}

- die_if_kernel("do_lasx invoked from kernel context!",
regs);
-
preempt_disable();
init_restore_lasx();
preempt_enable();
@@ -666,6 +667,7 @@ asmlinkage void noinstr do_lbt(struct
pt_regs *regs)
{
irqentry_state_t state = irqentry_enter(regs);

+ die_if_kernel("do_lbt invoked from kernel context!",
regs);
local_irq_enable();
force_sig(SIGILL);
local_irq_disable();
@@ -677,7 +679,6 @@ asmlinkage void noinstr do_reserved(struct
pt_regs *regs)
{
irqentry_state_t state = irqentry_enter(regs);

- local_irq_enable();
/*
* Game over - no way to handle this if it ever
occurs. Most probably
* caused by a fatal error after another
hardware/software error.
@@ -685,8 +686,8 @@ asmlinkage void noinstr do_reserved(struct
pt_regs *regs)
pr_err("Caught reserved exception %u on pid:%d [%s] -
should not happen\n",
read_csr_excode(), current->pid, current->comm);
die_if_kernel("do_reserved exception", regs);
+ local_irq_enable();
force_sig(SIGUNUSED);
-
local_irq_disable();

irqentry_exit(regs, state);
--
2.34.3