Re: [PATCH 2/2] riscv: fix the IPI missing issue in nommu mode

From: Palmer Dabbelt
Date: Wed Mar 18 2020 - 21:45:17 EST


On Tue, 03 Mar 2020 01:34:18 PST (-0800), greentime.hu@xxxxxxxxxx wrote:
This patch fixes the IPI(inner processor interrupt) missing issue. It
failed because it used hartid_mask to iterate for_each_cpu(), however the
cpu_mask and hartid_mask may not be always the same. It will never send the
IPI to hartid 4 because it will be skipped in for_each_cpu loop in my case.

We can reproduce this case in Qemu sifive_u machine by this command.
qemu-system-riscv64 -nographic -smp 5 -m 1G -M sifive_u -kernel \
arch/riscv/boot/loader

It will hang in csd_lock_wait(csd) because the csd_unlock(csd) is not
called. It is not called because hartid 4 doesn't receive the IPI to
release this lock. The caller hart doesn't send the IPI to hartid 4 is
because of hartid 4 is skipped in for_each_cpu(). It will be skipped is
because "(cpu) < nr_cpu_ids" is not true. The hartid is 4 and nr_cpu_ids
is 4. Therefore it should use cpumask in for_each_cpu() instead of
hartid_mask.

/* Send a message to all CPUs in the map */
arch_send_call_function_ipi_mask(cfd->cpumask_ipi);

if (wait) {
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd;
csd = per_cpu_ptr(cfd->csd, cpu);
csd_lock_wait(csd);
}
}

for ((cpu) = -1; \
(cpu) = cpumask_next((cpu), (mask)), \
(cpu) < nr_cpu_ids;)

It could boot to login console after this patch applied.

Fixes: b2d36b5668f6 ("riscv: provide native clint access for M-mode")
Signed-off-by: Greentime Hu <greentime.hu@xxxxxxxxxx>
---
arch/riscv/include/asm/clint.h | 8 ++++----
arch/riscv/kernel/smp.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/clint.h b/arch/riscv/include/asm/clint.h
index 6eaa2eedd694..a279b17a6aad 100644
--- a/arch/riscv/include/asm/clint.h
+++ b/arch/riscv/include/asm/clint.h
@@ -15,12 +15,12 @@ static inline void clint_send_ipi_single(unsigned long hartid)
writel(1, clint_ipi_base + hartid);
}

-static inline void clint_send_ipi_mask(const struct cpumask *hartid_mask)
+static inline void clint_send_ipi_mask(const struct cpumask *mask)
{
- int hartid;
+ int cpu;

- for_each_cpu(hartid, hartid_mask)
- clint_send_ipi_single(hartid);
+ for_each_cpu(cpu, mask)
+ clint_send_ipi_single(cpuid_to_hartid_map(cpu));
}

static inline void clint_clear_ipi(unsigned long hartid)
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index eb878abcaaf8..e0a6293093f1 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -96,7 +96,7 @@ static void send_ipi_mask(const struct cpumask *mask, enum ipi_message_type op)
if (IS_ENABLED(CONFIG_RISCV_SBI))
sbi_send_ipi(cpumask_bits(&hartid_mask));
else
- clint_send_ipi_mask(&hartid_mask);
+ clint_send_ipi_mask(mask);
}

static void send_ipi_single(int cpu, enum ipi_message_type op)

Thanks. We should really stop putting hart IDs in cpumasks, as that's just
nonsense.

Reviewed-by: Palmer Dabbelt <palmerdabbelt@xxxxxxxxxx>

I'm taking these both onto fixes.