Re: [PATCH v2] RISC-V: Probe misaligned access speed in parallel

From: Evan Green
Date: Wed Sep 20 2023 - 18:07:29 EST


On Wed, Sep 20, 2023 at 2:27 PM Atish Patra <atishp@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Sep 20, 2023 at 2:04 PM Conor Dooley <conor@xxxxxxxxxx> wrote:
> >
> > Yo,
> >
> > On Wed, Sep 20, 2023 at 12:38:01PM -0700, Evan Green wrote:
> > > Probing for misaligned access speed takes about 0.06 seconds. On a
> > > system with 64 cores, doing this in smp_callin() means it's done
> > > serially, extending boot time by 3.8 seconds. That's a lot of boot time.
> > >
> > > Instead of measuring each CPU serially, let's do the measurements on
> > > all CPUs in parallel. If we disable preemption on all CPUs, the
> > > jiffies stop ticking, so we can do this in stages of 1) everybody
> > > except core 0, then 2) core 0.
> > >
> > > The measurement call in smp_callin() stays around, but is now
> > > conditionalized to only run if a new CPU shows up after the round of
> > > in-parallel measurements has run. The goal is to have the measurement
> > > call not run during boot or suspend/resume, but only on a hotplug
> > > addition.
> > >
> > > Reported-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > > Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf
> > > Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
> > > Signed-off-by: Evan Green <evan@xxxxxxxxxxxx>
> > > Reviewed-by: Andrew Jones <ajones@xxxxxxxxxxxxxxxx>
> > > Tested-by: Andrew Jones <ajones@xxxxxxxxxxxxxxxx>
> > >
> > > ---
> > >
> > > Changes in v2:
> > > - Removed new global, used system_state == SYSTEM_RUNNING instead
> > > (Jisheng)
> > > - Added tags
> > >
> > > arch/riscv/include/asm/cpufeature.h | 2 +-
> > > arch/riscv/kernel/cpufeature.c | 22 +++++++++++++++++-----
> > > arch/riscv/kernel/smpboot.c | 11 ++++++++++-
> > > 3 files changed, 28 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> > > index d0345bd659c9..b139796392d0 100644
> > > --- a/arch/riscv/include/asm/cpufeature.h
> > > +++ b/arch/riscv/include/asm/cpufeature.h
> > > @@ -30,6 +30,6 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
> > > /* Per-cpu ISA extensions. */
> > > extern struct riscv_isainfo hart_isa[NR_CPUS];
> > >
> > > -void check_unaligned_access(int cpu);
> > > +int check_unaligned_access(void *unused);
> > >
> > > #endif
> > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > index 1cfbba65d11a..40bb854fcb96 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -556,8 +556,9 @@ unsigned long riscv_get_elf_hwcap(void)
> > > return hwcap;
> > > }
> > >
> > > -void check_unaligned_access(int cpu)
> > > +int check_unaligned_access(void *unused)
> > > {
> > > + int cpu = smp_processor_id();
> > > u64 start_cycles, end_cycles;
> > > u64 word_cycles;
> > > u64 byte_cycles;
> > > @@ -571,7 +572,7 @@ void check_unaligned_access(int cpu)
> > > page = alloc_pages(GFP_NOWAIT, get_order(MISALIGNED_BUFFER_SIZE));
> > > if (!page) {
> > > pr_warn("Can't alloc pages to measure memcpy performance");
> > > - return;
> > > + return 0;
> >
> > Dumb question maybe, but I am limited setup wise at the moment due to
> > a hardware failure which makes checking stuff hard, why the signature
> > change? Requirement for on_each_cpu()?
> >
>
> Requirement for smp_call_on_cpu.

Right.

>
> > > }
> > >
> > > /* Make an unaligned destination buffer. */
> > > @@ -643,15 +644,26 @@ void check_unaligned_access(int cpu)
> > >
> > > out:
> > > __free_pages(page, get_order(MISALIGNED_BUFFER_SIZE));
> > > + return 0;
> > > +}
> > > +
> > > +static void check_unaligned_access_nonboot_cpu(void *param)
> > > +{
> > > + if (smp_processor_id() != 0)
> > > + check_unaligned_access(param);
> > > }
> > >
> > > -static int check_unaligned_access_boot_cpu(void)
> > > +static int check_unaligned_access_all_cpus(void)
> > > {
> > > - check_unaligned_access(0);
> > > + /* Check everybody except 0, who stays behind to tend jiffies. */
> > > + on_each_cpu(check_unaligned_access_nonboot_cpu, NULL, 1);
> > > +
> > > + /* Check core 0. */
> > > + smp_call_on_cpu(0, check_unaligned_access, NULL, true);
> > > return 0;
> >
> > Why does this function return an int if it can only return 0?
> >

This is a requirement on the initcall_t function pointer type.

>
> Should we define a check_unaligned_access_boot_cpu to avoid the return
> type change ?
> We can also get rid of the unused parameter in the
> check_unaligned_access function.

I don't think it matters too much either way. In my mind it sorta
shifts the ballast around but doesn't really decrease it.

>
> I also noticed on_each_cpu invokes the function within preempt_disable/enable.
> check_unaligned_access will invoke it again. It's probably harmless
> but there is no need for non-boot cpus.
>
> We can just have preempt_disable/enable around check_unaligned_access
> in the check_unaligned_access_boot_cpu function.

It is harmless. I haven't walked fully through smp_call_on_cpu(), but
I'd be intuitively quite surprised if preemption were enabled on those
callbacks too (as you'd be saying "run my function on this cpu, but no
guarantee it will stay there!"). So in theory I should be able to
remove the preempt_disable/enable entirely. As a side note: I added
the smp_call_on_cpu() because I couldn't convince myself the initcalls
would stay on cpu 0. In practice they always seem to be.


-Evan