Re: [PATCH v2] riscv: Add support for BATCHED_UNMAP_TLB_FLUSH

From: Nam Cao
Date: Mon Jan 08 2024 - 15:45:01 EST


On Mon, 8 Jan 2024 21:43:46 +0100 Nam Cao <namcao@xxxxxxxxxxxxx> wrote:

> On Mon, 8 Jan 2024 20:36:40 +0100 Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote:
> > Allow to defer the flushing of the TLB when unmapping pages, which allows
> > to reduce the numbers of IPI and the number of sfence.vma.
> >
> > The ubenchmarch used in commit 43b3dfdd0455 ("arm64: support
> > batched/deferred tlb shootdown during page reclamation/migration") that
> > was multithreaded to force the usage of IPI shows good performance
> > improvement on all platforms:
> >
> > * Unmatched: ~34%
> > * TH1520 : ~78%
> > * Qemu : ~81%
> >
> > In addition, perf on qemu reports an important decrease in time spent
> > dealing with IPIs:
> >
> > Before: 68.17% main [kernel.kallsyms] [k] __sbi_rfence_v02_call
> > After : 8.64% main [kernel.kallsyms] [k] __sbi_rfence_v02_call
> >
> > * Benchmark:
> >
> > int stick_this_thread_to_core(int core_id) {
> > int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
> > if (core_id < 0 || core_id >= num_cores)
> > return EINVAL;
> >
> > cpu_set_t cpuset;
> > CPU_ZERO(&cpuset);
> > CPU_SET(core_id, &cpuset);
> >
> > pthread_t current_thread = pthread_self();
> > return pthread_setaffinity_np(current_thread,
> > sizeof(cpu_set_t), &cpuset);
> > }
> >
> > static void *fn_thread (void *p_data)
> > {
> > int ret;
> > pthread_t thread;
> >
> > stick_this_thread_to_core((int)p_data);
> >
> > while (1) {
> > sleep(1);
> > }
> >
> > return NULL;
> > }
> >
> > int main()
> > {
> > volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > pthread_t threads[4];
> > int ret;
> >
> > for (int i = 0; i < 4; ++i) {
> > ret = pthread_create(&threads[i], NULL, fn_thread, (void *)i);
> > if (ret)
> > {
> > printf("%s", strerror (ret));
> > }
> > }
> >
> > memset(p, 0x88, SIZE);
> >
> > for (int k = 0; k < 10000; k++) {
> > /* swap in */
> > for (int i = 0; i < SIZE; i += 4096) {
> > (void)p[i];
> > }
> >
> > /* swap out */
> > madvise(p, SIZE, MADV_PAGEOUT);
> > }
> >
> > for (int i = 0; i < 4; i++)
> > {
> > pthread_cancel(threads[i]);
> > }
> >
> > for (int i = 0; i < 4; i++)
> > {
> > pthread_join(threads[i], NULL);
> > }
> >
> > return 0;
> > }
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx>
> > Reviewed-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > Tested-by: Jisheng Zhang <jszhang@xxxxxxxxxx> # Tested on TH1520
>
> Before:
> real 0m36.674s
> user 0m0.173s
> sys 0m36.493s
> After:
> real 0m18.016s
> user 0m0.125s
> sys 0m17.885s
>
> Tested-by: Nam Cao <namcao@xxxxxxxxxxxxx>

I forgot to mention: this is for Starfive's Visionfive 2 board.

Best regards,
Nam