Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

From: Andrea Parri
Date: Thu Feb 08 2024 - 06:43:17 EST


> +static int __ftrace_modify_code(void *data)
> +{
> + struct ftrace_modify_param *param = data;
> +
> + if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> + ftrace_modify_all_code(param->command);
> + atomic_inc(&param->cpu_count);

I stared at ftrace_modify_all_code() for a bit but honestly I don't see
what prevents the ->cpu_count increment from being reordered before the
insn write(s) (architecturally) now that you have removed the IPI dance:
perhaps add an smp_wmb() right before the atomic_inc() (or promote this
latter to a (void)atomic_inc_return_release()) and/or an inline comment
saying why such reordering is not possible?


> + } else {
> + while (atomic_read(&param->cpu_count) <= num_online_cpus())
> + cpu_relax();
> + smp_mb();

I see that you've lifted/copied the memory barrier from patch_text_cb():
what's its point? AFAIU, the barrier has no ordering effect on program
order later insn fetches; perhaps the code was based on some legacy/old
version of Zifencei? IAC, comments, comments, ... or maybe just remove
that memory barrier?


> + }
> +
> + local_flush_icache_all();
> +
> + return 0;
> +}

[...]


> @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
> if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> for (i = 0; ret == 0 && i < patch->ninsns; i++) {
> len = GET_INSN_LENGTH(patch->insns[i]);
> - ret = patch_text_nosync(patch->addr + i * len,
> - &patch->insns[i], len);
> + ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
> }
> atomic_inc(&patch->cpu_count);
> } else {
> @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
> smp_mb();
> }
>
> + local_flush_icache_all();
> +
> return ret;
> }
> NOKPROBE_SYMBOL(patch_text_cb);

My above remarks/questions also apply to this function.


On a last topic, although somehow orthogonal to the scope of this patch,
I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
correct: I can see why we may want (need to do) the local TLB flush be-
fore returning from patch_{map,unmap}(), but does a local flush suffice?
For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
sequence in their unmapping stage (and apparently relying on "no caching
of invalid ptes" in their mapping stage). Of course, "broadcasting" our
(riscv's) TLB invalidations will necessary introduce some complexity...

Thoughts?

Andrea