Re: [PATCH] irqchip/gic-v3: do runtime cpu cap check only when necessary

From: Marc Zyngier
Date: Sun Aug 28 2022 - 13:11:25 EST


On Sun, 28 Aug 2022 08:56:23 +0100,
Puyou Lu <puyou.lu@xxxxxxxxx> wrote:
>
> On Sat, Aug 27, 2022 at 04:13:00PM +0100, Marc Zyngier wrote:
> > On Sat, 27 Aug 2022 06:19:27 +0100,
> > Puyou Lu <puyou.lu@xxxxxxxxx> wrote:
> > >
> > > Now cpu cap check is done every exception happens on every arm64 platform,
> > > but this check is necessary on just few of then, so we can drop this
> > > check at compile time on others. This can decrease exception handle time
> > > on most cases.
> > >
> > > Fixes: 6d4e11c5e2e8 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
> > > Signed-off-by: Puyou Lu <puyou.lu@xxxxxxxxx>
> > > ---
> > > drivers/irqchip/irq-gic-v3.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index 262658fd5f9e..3f08c2ef1251 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -237,9 +237,11 @@ static void gic_redist_wait_for_rwp(void)
> > >
> > > static u64 __maybe_unused gic_read_iar(void)
> > > {
> > > +#ifdef CONFIG_CAVIUM_ERRATUM_23154
> > > if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
> > > return gic_read_iar_cavium_thunderx();
> > > else
> > > +#endif
> > > return gic_read_iar_common();
> > > }
> > > #endif
> >
> > You realise that cpus_have_const_cap() results purely in a couple of
> > branches once the caps have been finalised, right?
> >
> > Please provide data showing that it actually "can decrease exception
> > handle time on most cases", because I'm pretty sure you cannot measure
> > the difference in any meaningful way.
> >
> > M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
> Hi Marc,
> Thank you for the reply. Actually I did no test, just from the disassemble
> code of vmlinux, I saw about 6 instruction generated by
> cpus_have_const_cap, and about 36 by gic_read_iar_cavium_thunderx, which
> is useless for most CPUs. I think this will waste some cpu cycles, as
> exceptions can occur hunderds or thousands times per second. Also
> (6+36)*4=168 bytes of icache is wasted, and icache misses increase
> somewhere else.
> If I got things wrong, please correct me.

Well, what you got wrong is that these instructions are stepped over
two branches when the caps are finalised, and that doesn't appear in
the disassembly (you need to look at the code that is actually
executed).

Now, any optimisation of the sort must be backed by some performance
numbers. If you can show that this has a meaningful impact on a given
workload, I'm happy to look into it. But only if you can show that
data.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.