Re: Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

From: Namhyung Kim
Date: Fri Nov 17 2023 - 16:11:19 EST


On Wed, Nov 15, 2023 at 8:09 PM David Wang <00107082@xxxxxxx> wrote:
>
>
> At 2023-11-16 00:26:06, "Namhyung Kim" <namhyung@xxxxxxxxxx> wrote:
> >On Wed, Nov 15, 2023 at 8:12 AM David Wang <00107082@xxxxxxx> wrote:
> >>
> >>
> >> 在 2023-11-15 23:48:33,"Namhyung Kim" <namhyung@xxxxxxxxxx> 写道:
> >> >On Wed, Nov 15, 2023 at 3:00 AM David Wang <00107082@xxxxxxx> wrote:
> >> >>
> >> >>
> >> >>
> >> >> At 2023-11-15 18:32:41, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
> >> >> >
> >> >> >Namhyung, could you please take a look, you know how to operate this
> >> >> >cgroup stuff.
> >> >> >
> >> >>
> >> >> More information, I run the profiling with 8cpu machine on a SSD with ext4 filesystem :
> >> >>
> >> >> # mkdir /sys/fs/cgroup/mytest
> >> >> # echo $$ > /sys/fs/cgroup/mytest/cgroup.procs
> >> >> ## Start profiling targeting cgroup /sys/fs/cgroup/mytest on another terminal
> >> >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1
> >> >>
> >> >> I got a feeling that f06cc667f7990 would decrease total samples by 10%~20% when profiling IO benchmark within cgroup.
>
>
> >
> >Then what is your profiling tool? Where did you see
> >the 10%~20% drop in samples?
> >
>
> I wrote a simple/raw tool just for profiling callchains, which use perf_event_open with following attr:
> attr.type = PERF_TYPE_SOFTWARE;
> attr.config = PERF_COUNT_SW_CPU_CLOCK;
> attr.sample_freq = 777; // adjust it
> attr.freq = 1;
> attr.wakeup_events = 16;
> attr.sample_type = PERF_SAMPLE_TID|PERF_SAMPLE_CALLCHAIN;
> attr.sample_max_stack = 32;
>
> The source code could be found here: https://github.com/zq-david-wang/linux-tools/tree/main/perf/profiler
> >>
> >> I am not experienced with the perf-tool at all, too complicated a tool for me.... But I think I can try it.
> >
> >I feel sorry about that. In most cases, just `perf record -a` and
> >then `perf report` would work well. :)
> >
> Thanks for the information, I use following command to profile with perf:
> `./perf record -a -e cpu-clock -G mytest`
> I have run several round of test, and before each test, the system was rebooted, and perf output is
>
> On 6.7.0-rc1:
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 527 times to write data ]
> [ perf record: Captured and wrote 132.648 MB perf.data (2478745 samples) ]
> ---reboot
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 473 times to write data ]
> [ perf record: Captured and wrote 119.205 MB perf.data (2226994 samples) ]
>
>
> On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted
>
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 567 times to write data ]
> [ perf record: Captured and wrote 142.771 MB perf.data (2668224 samples) ]
> ---reboot
> $ sudo ./perf record -a -e cpu-clock -G mytest
> ^C[ perf record: Woken up 557 times to write data ]
> [ perf record: Captured and wrote 140.604 MB perf.data (2627167 samples) ]
>
>
> I also run with `-F 777`, which is some random number I used in my tool, (just to compare with my tool )
>
> On 6.7.0-rc1
> $ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
> ^C[ perf record: Woken up 93 times to write data ]
> [ perf record: Captured and wrote 24.575 MB perf.data (455222 samples) ] ( My tool have only ~359K samples, not stable)
>
>
> On 6.7.0-rc1 with f06cc667f79909e9175460b167c277b7c64d3df0 reverted
> $ sudo ./perf record -a -e cpu-clock -F 777 -G mytest
> ^C[ perf record: Woken up 98 times to write data ]
> [ perf record: Captured and wrote 25.703 MB perf.data (476390 samples) ] (My tool have about ~446K, stable)
>
>
> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
> 1. sample missing.
> 2. sample unstable, total sample count drift a lot between tests.

Hmm.. so the fio process was running in the background during
the profiling, right? But I'm not sure how you measured the same
amount of time. Probably you need to run this (for 10 seconds):

sudo perf record -a -G mytest -- sleep 10

And I guess you don't run the perf command in the target cgroup
which is good.

And is there any chance if it's improved because of the change?
Are the numbers in 6.7 better or worse?

Thanks,
Namhyung