RE: [PATCH 4/6] perf: Optimize get_recursion_context()

From: David Laight
Date: Mon Nov 09 2020 - 09:14:53 EST




> -----Original Message-----
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Sent: 09 November 2020 12:13
> To: David Laight <David.Laight@xxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>; Jesper Dangaard Brouer <brouer@xxxxxxxxxx>;
> mingo@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; kan.liang@xxxxxxxxxxxxxxx;
> acme@xxxxxxxxxx; mark.rutland@xxxxxxx; alexander.shishkin@xxxxxxxxxxxxxxx; jolsa@xxxxxxxxxx;
> namhyung@xxxxxxxxxx; ak@xxxxxxxxxxxxxxx; eranian@xxxxxxxxxx
> Subject: Re: [PATCH 4/6] perf: Optimize get_recursion_context()
>
> On Sat, Oct 31, 2020 at 12:11:42PM +0000, David Laight wrote:
> > The gcc 7.5.0 I have handy probably generates the best code for:
> >
> > unsigned char q_2(unsigned int pc)
> > {
> > unsigned char rctx = 0;
> >
> > rctx += !!(pc & (NMI_MASK));
> > rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
> > rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
> >
> > return rctx;
> > }
> >
> > 0000000000000000 <q_2>:
> > 0: f7 c7 00 00 f0 00 test $0xf00000,%edi # clock 0
> > 6: 0f 95 c0 setne %al # clock 1
> > 9: f7 c7 00 00 ff 00 test $0xff0000,%edi # clock 0
> > f: 0f 95 c2 setne %dl # clock 1
> > 12: 01 c2 add %eax,%edx # clock 2
> > 14: 81 e7 00 01 ff 00 and $0xff0100,%edi
> > 1a: 0f 95 c0 setne %al
> > 1d: 01 d0 add %edx,%eax # clock 3
> > 1f: c3 retq
> >
> > I doubt that is beatable.
> >
> > I've annotated the register dependency chain.
> > Likely to be 3 (or maybe 4) clocks.
> > The other versions are a lot worse (7 or 8) without allowing
> > for 'sbb' taking 2 clocks on a lot of Intel cpus.
>
> https://godbolt.org/z/EfnG8E
>
> Recent GCC just doesn't want to do that. Still, using u8 makes sense, so
> I've kept that.

u8 helps x86 because its 'setne' only affects the low 8 bits.
I guess that seemed a good idea when it was added (386).
It doesn't seem to make the other architectures much worse.

gcc 10.x can be persuaded to generate the above code.

https://godbolt.org/z/6GoT94

It sometimes seems to me that every new version of gcc is
larger, slower and generates worse code than the previous one.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)