On Thu, Nov 09, 2023 at 08:24:58PM +0100, Andrea Parri wrote:
Mathieu, all,
Sorry for the delay,
AFAIR this patch implements sync_core_before_usermode which gets used by
membarrier_mm_sync_core_before_usermode() to handle the uthread->kthread->uthread
case. It relies on switch_mm issuing a core serializing instruction as well.
Looking at RISC-V switch_mm(), I see that switch_mm() calls:
flush_icache_deferred(next, cpu);
which only issues a fence.i if a deferred icache flush was required. We're
missing the part that sets the icache_stale_mask cpumask bits when a
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is invoked.
[...]
The only part where I think you may want to keep some level of deferred
icache flushing as you do now is as follows:
- when membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is invoked,
call a new architecture hook which sets cpumask bits in the mm context
that tells the next switch_mm on each cpu to issue fence.i for that mm.
- keep something like flush_icache_deferred as you have now.
Otherwise, I fear the overhead of a very expensive fence.i would be too
much when processes registering with MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE
and start doing fence.i on each and every switch_mm().
So you'd basically rely on membarrier to only issue IPIs to the CPUs which are
currently running threads belonging to the mm, and handle the switch_mm with
the sync_core_before_usermode() for uthread->kthread->uthread case, and implement
a deferred icache flush for the typical switch_mm() case.
I've (tried to) put this together and obtained the two patches reported below.
Please let me know if this aligns with your intentions and/or there's interest
in a proper submission.