Re: [PATCH] perf test: Remove atomics from test_loop to avoid test failures

From: Leo Yan
Date: Sun Nov 26 2023 - 02:41:30 EST


On Sat, Nov 25, 2023 at 07:10:25PM +0000, Nick Forrington wrote:

[...]

> > @Nick, could you narrow down which specific test case causing the
> > failure.
> >
> > [...]
>
> All checks for ${testsym} in record.sh (including the example you provide)
> can fail, as the expected symbol (test_loop) is not the top-most function on
> the stack (and therefore not the symbol associated with the sample).
>
>
> Example perf report output:
>
> # Overhead  Command  Shared Object          Symbol
> # ........  .......  ..................... .............................
> #
>     99.53%  perf     perf                   [.] __aarch64_ldadd4_relax

Thanks for confirmation.

I cannot reproduce the issue on my Juno board with Debian (buster).
It's likely because I don't use GCC toolchain with enabling
'-moutline-atomics' AArch64 flag [1].

> ...
>
>
> You can see the issue when recording/reporting with call stacks:
>
> # Children      Self  Command  Shared Object          Symbol
> # ........  ........  .......  .....................
> ..........................................................
> #
>     99.52%    99.52%  perf     perf                   [.]
> __aarch64_ldadd4_relax
>             |
>             |--49.77%--0xffffb905a5dc
>             |          0xffffb8ff0aec
>             |          thfunc
>             |          test_loop
>             |          __aarch64_ldadd4_relax



> ...
>
> >
> > > I believe that it was there to prevent the compiler to optimize the loop
> > > out or some reason like that. Hopefully, it will work even without that
> > > on all architectures with all compilers that are used for building perf...
> > Agreed.
> >
> > As said above, I'd like to step back a bit for making clear what's the
> > exactly failure caused by the program.
>
> I don't think this loop could be sensibly optimised away, as it depends on
> "done", which is defined at file scope (and assigned by a signal handler).

I verified your patch with 'perf annotate'.

The disassembly on Arm64 is:

noinline void test_loop(void)
{
stp x29, x30, [sp, #-32]!
mov x29, sp
adrp x0, options+0x4c0
ldr x0, [x0, #3664]
ldr x1, [x0]
str x1, [sp, #24]
mov x1, #0x0 // #0
while (!done);
nop
20: adrp x0, spacing.22251
add x0, x0, #0xa04
ldr w0, [x0]
cmp w0, #0x0
↑ b.eq 20
}

The disassembly on x86_64 is:

noinline void test_loop(void)
{
endbr64
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
mov %fs:0x28,%rax
mov %rax,-0x8(%rbp)
xor %eax,%eax
while (!done);
nop
1c: mov done,%eax
test %eax,%eax
↑ je 1c
}


Maybe the commit log caused a bit confusion, the problem is after
enabling "-moutline-atomics" on aarch64, the overhead is altered into
the linked __aarch64_ldadd4_relax() function, test_loop() cannot be
sampled anymore, but it's not about stack tracing.

Anyway, the patch is fine for me.

Thanks,
Leo

[1] https://stackoverflow.com/questions/65239845/how-to-enable-mno-outline-atomics-aarch64-flag