Re: [PATCH v2] RISC-V: Probe misaligned access speed in parallel

From: Evan Green
Date: Thu Sep 21 2023 - 13:00:28 EST


On Thu, Sep 21, 2023 at 9:44 AM Evan Green <evan@xxxxxxxxxxxx> wrote:
>
> On Thu, Sep 21, 2023 at 3:22 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > ...
> > > > For probing alignment speed, you just care about running it on that
> > > > cpu. Correct ?
> > >
> > > For this we care both about not migrating to other CPUs, and also
> > > secondarily minimizing disturbances while the test is being run.
> > > Usually I equate pre-emption with migration, but in this case I think
> > > the worker threads are bound to that CPU. So I'll keep the
> > > preempt_disable/enable where it is, since it's harmless for CPUs other
> > > than 0, but useful for 0. I also like it for readability as it
> > > highlights the critical section (as a reader, "is preemption disabled"
> > > would be one of my first questions when studying this).
> >
> > You need to disable pre-emption to get any kind of meaningful answer.
> >
> > But why do you need to run the test on more than the boot cpu?
> > If you've a heterogenous mix of cpu any code that looks at the answer
> > is going to behave incorrectly unless it has also disabled pre-emption
> > or is bound to a cpu.
>
> I don't think it's safe to assume misaligned access speed is the same
> across all cores. In a big.little combination I can easily imagine the
> big cores having fast misaligned access and the slow cores not having
> it (though hopefully the slow cores don't kick it to firmware). Since
> this info is presented to usermode per-cpu, I'd like it to be correct.
>
> >
> > One obvious use of the result is to setup some static branches.
> > But that assumes all cpu are the same.
>
> Right, this could be used to set up static branches, or in an ifunc
> selector. This is why we provide pre-computed answers for "all CPUs"
> in hwprobe. If the situation I describe above did happen, code asking

Correction: Not exactly precomputed answers, but cached vDSO data
capable of quickly answering usermode queries for systems with
homogeneous CPUs.