Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

From: Namhyung Kim
Date: Mon Nov 20 2023 - 17:59:33 EST


On Fri, Nov 17, 2023 at 5:48 PM David Wang <00107082@xxxxxxx> wrote:
>
>
> At 2023-11-18 05:11:02, "Namhyung Kim" <namhyung@xxxxxxxxxx> wrote:
> >On Wed, Nov 15, 2023 at 8:09 PM David Wang <00107082@xxxxxxx> wrote:
> >>
>
> >>
> >>
> >> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
> >> 1. sample missing.
> >> 2. sample unstable, total sample count drift a lot between tests.
> >
> >Hmm.. so the fio process was running in the background during
> >the profiling, right? But I'm not sure how you measured the same
> >amount of time. Probably you need to run this (for 10 seconds):
> >
> > sudo perf record -a -G mytest -- sleep 10
> >
> >And I guess you don't run the perf command in the target cgroup
> >which is good.
> >
>
> Yes profiling process was not in the target cgroup.
> I use fio with `fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=600 --numjobs=4 --time_based=1` which would run 600 seconds.
> There would be drifts in the profiling report between runs, from those small samples of test data I collected, maybe not enough to make a firm conclusion, I feel when the commit is reverted, the expectation for total sample count is higher and the standard deviation is smaller.
>
> >And is there any chance if it's improved because of the change?
> >Are the numbers in 6.7 better or worse?
> >
> I have no idea whether the change of expected total sample count a bug or a fix, but, the observed result that total sample count drift a lot (bigger standard deviation), I think , is a bad thing.

Right. Can you run perf stat to measure the number of context
switches and cgroup switches, then?

sudo perf stat -a -e context-switches,cgroup-switches -- sleep 10

Thanks,
Namhyung