Re: [PATCH v2] cpufreq: intel_pstate: Avoid initializing variables prematurely

From: Rafael J. Wysocki
Date: Tue May 23 2023 - 09:37:43 EST


On Tue, May 23, 2023 at 2:20 PM srinivas pandruvada
<srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, 2023-05-23 at 13:08 +0200, Rafael J. Wysocki wrote:
> > On Tue, May 23, 2023 at 10:51 AM Fieah Lim <kweifat@xxxxxxxxx> wrote:
> > >
> > > We should avoid initializing some rather expensive data
> > > when the function has a chance to bail out earlier
> > > before actually using it.
> > > This applies to the following initializations:
> > >
> > > - cpudata *cpu = all_cpu_data; (in everywhere)
> > > - this_cpu = smp_processor_id(); (in notify_hwp_interrupt)
> > > - hwp_cap = READ_ONCE(cpu->hwp_cap_cached); (in
> > > intel_pstate_hwp_boost_up)
> > >
> > > These initializations are premature because there is a chance
> > > that the function will bail out before actually using the data.
> > > I think this qualifies as a micro-optimization,
> > > especially in such a hot path.
> > >
> > > While at it, tidy up how and when we initialize
> > > all of the cpu_data pointers, for the sake of consistency.
> > >
> > > A side note on the intel_pstate_cpu_online change:
> > > we simply don't have to initialize cpudata just
> > > for the pr_debug, while policy->cpu is being there.
> > >
> > > Signed-off-by: Fieah Lim <kweifat@xxxxxxxxx>
> > > ---
> > > V1 -> V2: Rewrite changelog for better explanation.
> > >
>
> [...]
>
> > > void notify_hwp_interrupt(void)
> > > {
> > > - unsigned int this_cpu = smp_processor_id();
> > > + unsigned int this_cpu;
> > > struct cpudata *cpudata;
> > > unsigned long flags;
> > > u64 value;
> > > @@ -1591,6 +1593,8 @@ void notify_hwp_interrupt(void)
> > > if (!(value & 0x01))
> > > return;
> > >
> > > + this_cpu = smp_processor_id();
> > > +
> > > spin_lock_irqsave(&hwp_notify_lock, flags);
> > >
> > > if (!cpumask_test_cpu(this_cpu, &hwp_intr_enable_mask))
> >
> > This is a place where it may really matter for performance, but how
> > much? Can you actually estimate this?
>
> If DEBUG_PREEMPT is defined
> ~12 instructions (most of them with latency = 1 in dependency chain)

I really meant "estimate the effect of this change on performance",
because I'm not sure if it is going to be visible in any test.

But yes, skipping it if not needed at least makes some sense.