Re: [External] Re: [PATCH] RISC-V: add uniprocessor flush_tlb_range() support

From: yunhui cui
Date: Mon Jan 29 2024 - 22:18:49 EST


Hi Jisheng,


On Mon, Jan 29, 2024 at 8:05 PM Jisheng Zhang <jszhang@xxxxxxxxxx> wrote:
>
> On Mon, Jan 29, 2024 at 07:02:10PM +0800, yunhui cui wrote:
> > Hi Jisheng,
> >
> > On Mon, Jan 29, 2024 at 5:51 PM Jisheng Zhang <jszhang@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jan 29, 2024 at 04:26:57PM +0800, yunhui cui wrote:
> > > > Hi Jisheng,
> > > >
> > > > On Mon, Jan 29, 2024 at 4:02 PM Jisheng Zhang <jszhang@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Jan 25, 2024 at 02:20:44PM +0800, Yunhui Cui wrote:
> > > > > > Add support for flush_tlb_range() to improve TLB performance for
> > > > > > UP systems. In order to avoid the mutual inclusion of tlbflush.h
> > > > > > and hugetlb.h, the UP part is also implemented in tlbflush.c.
> > > > >
> > > > > Hi Yunhui,
> > > > >
> > > > > IIRC, Samuel sent similar patch series a few weeks ago.
> > > > >
> > > > > https://lore.kernel.org/linux-riscv/20240102220134.3229156-1-samuel.holland@xxxxxxxxxx/
> > > > >
> > > > > After that series, do you still need this patch?
> > > >
> > > > Thank you for your reminder. I didn't find it before I mailed my
> > > > patch. I just looked at the content of this patch. I understand that
> > > > my patch is needed. For a single core, a more concise TLB flush logic
> > > > is needed, and it is helpful to improve performance.
> > >
> > > Currently, riscv UP flush_tlb_range still use flush all TLB entries,
> > > obviously it's is a big hammer, this is what your patch is trying to
> > > optimize. I'm not sure whether I understand your code correctly or not.
> > > Let me know if I misunderstand your code.
> > >
> > > After patch5 of the Samuel's series, __flush_tlb_range is unified for
> > > SMP and UP, so that UP can also benefit from recent improvements, such
> > > as range flush rather than all.
> >
> > In my opinion, UP does not need to combine some SMP if... else,
> > on_each_cpu(...) logic, which is also a manifestation of performance
>
> Hi Yunhui,
>
> IIRC, the compiler will optimise out the unnecessary logic under UP, I
> may misread the code. But if no, indeed, there's improvement room.
> However, even in this case, IMHO, it's better if you can base on
> Samuel's series.
> Anyway, the optimization(range tlb entries rather than *all* entries under
> UP case) you want to do has been implemented. While I'm not sure whether
> we can rely on the compiler to optimize out all unnecessary logics.

Okay, let's see if there are any necessary optimizations based on
Samuel's series.

Thanks,
Yunhui