RE: [PATCH 1/2] perf mmap: Fix perf backward recording

From: Liang, Kan
Date: Wed Nov 01 2017 - 12:23:02 EST


> On 2017/11/1 21:57, Liang, Kan wrote:
> >> On 2017/11/1 20:00, Namhyung Kim wrote:
> >>> On Wed, Nov 01, 2017 at 06:32:50PM +0800, Wangnan (F) wrote:
> >>>> On 2017/11/1 17:49, Namhyung Kim wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Nov 01, 2017 at 05:53:26AM +0000, Wang Nan wrote:
> >>>>>> perf record backward recording doesn't work as we expected: it
> >>>>>> never overwrite when ring buffer full.
> >>>>>>
> >>>>>> Test:
> >>>>>>
> >>>>>> (Run a busy printing python task background like this:
> >>>>>>
> >>>>>> while True:
> >>>>>> print 123
> >>>>>>
> >>>>>> send SIGUSR2 to perf to capture snapshot.)
> >>>> [SNIP]
> >>>>
> >>>>>> Signed-off-by: Wang Nan <wangnan0@xxxxxxxxxx>
> >>>>>> ---
> >>>>>> tools/perf/util/evlist.c | 8 +++++++-
> >>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>>> index c6c891e..4c5daba 100644
> >>>>>> --- a/tools/perf/util/evlist.c
> >>>>>> +++ b/tools/perf/util/evlist.c
> >>>>>> @@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist
> >> *evlist __maybe_unused,
> >>>>>> }
> >>>>>> static int perf_evlist__mmap_per_evsel(struct perf_evlist
> >>>>>> *evlist, int
> >> idx,
> >>>>>> - struct mmap_params *mp, int
> cpu_idx,
> >>>>>> + struct mmap_params *_mp, int
> cpu_idx,
> >>>>>> int thread, int *_output, int
> >> *_output_backward)
> >>>>>> {
> >>>>>> struct perf_evsel *evsel;
> >>>>>> int revent;
> >>>>>> int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
> >>>>>> + struct mmap_params *mp;
> >>>>>> evlist__for_each_entry(evlist, evsel) {
> >>>>>> struct perf_mmap *maps = evlist->mmap;
> >>>>>> + struct mmap_params rdonly_mp;
> >>>>>> int *output = _output;
> >>>>>> int fd;
> >>>>>> int cpu;
> >>>>>> + mp = _mp;
> >>>>>> if (evsel->attr.write_backward) {
> >>>>>> output = _output_backward;
> >>>>>> maps = evlist->backward_mmap;
> >>>>>> + rdonly_mp = *_mp;
> >>>>>> + rdonly_mp.prot &= ~PROT_WRITE;
> >>>>>> + mp = &rdonly_mp;
> >>>>>> if (!maps) {
> >>>>>> maps =
> perf_evlist__alloc_mmap(evlist);
> >>>>>> --
> >>>>> What about this instead (not tested)?
> >>>>>
> >>>>>
> >>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>>>> index c6c891e154a6..27ebe355e794 100644
> >>>>> --- a/tools/perf/util/evlist.c
> >>>>> +++ b/tools/perf/util/evlist.c
> >>>>> @@ -838,6 +838,11 @@ static int
> perf_evlist__mmap_per_evsel(struct
> >> perf_evlist *evlist, int idx,
> >>>>> if (*output == -1) {
> >>>>> *output = fd;
> >>>>> + if (evsel->attr.write_backward)
> >>>>> + mp->prot = PROT_READ;
> >>>>> + else
> >>>>> + mp->prot = PROT_READ | PROT_WRITE;
> >>>>> +
> >>>> If evlist->overwrite is true, PROT_WRITE should be unset even if
> >>>> write_backward is not set. If you want to delay the setting of
> >>>> mp->prot, you need to consider both evlist->overwrite and
> >>>> evsel->attr.write_backward.
> >>> I thought evsel->attr.write_backward should be set when
> >>> evlist->overwrite is set. Do you mean following case?
> >>>
> >>> perf record --overwrite -e 'cycles/no-overwrite/'
> >>>
> >> No. evlist->overwrite is unrelated to '--overwrite'. This is why I
> >> said the concept of 'overwrite' and 'backward' is ambiguous.
> >>
> > Yes, I think we should make it clear.
> >
> > As we discussed previously, there are four possible combinations to
> > access ring buffer , 'forward non-overwrite', 'forward overwrite',
> > 'backward non-overwrite' and 'backward overwrite'.
> >
> > Actually, not all of the combinations are necessary.
> > - 'forward overwrite' mode brings some problems which were mentioned
> > in commit ID 9ecda41acb97 ("perf/core: Add ::write_backward attribute
> > to perf event").
> > - 'backward non-overwrite' mode is very similar as 'forward non-overwrite'.
> > There is no extra advantage. Only keep one non-overwrite mode is
> enough.
> > So 'forward non-overwrite' and 'backward overwrite' are enough for all
> perf tools.
> >
> > Furthermore, 'forward' and 'backward' only indicate the direction of
> > the ring buffer. They don't impact the result and performance. It is
> > not important as the concept of overwrite/non-overwrite.
> >
> > To simplify the concept, only 'non-overwrite' mode and 'overwrite'
> > mode should be kept. 'non-overwrite' mode indicates the forward ring
> > buffer. 'overwrite' mode indicates the backward ring buffer.
> >
> >> perf record never sets 'evlist->overwrite'. What '--overwrite'
> >> actually does is setting write_backward. Some testcases needs overwrite
> evlist.
> >>
> > There are only four test cases which set overwrite,
> > sw-clock,task-exit, mmap-basic, backward-ring-buffer.
> > Only backward-ring-buffer is 'backward overwrite'.
> > The rest three are all 'forward overwrite'. We just need to set
> > write_backward to convert them to 'backward overwrite'.
> > I think it's not hard to clean up.
>
> If we add a new rule that overwrite ring buffers are always backward then it
> is not hard to cleanup. However, the support of forward overwrite ring
> buffer has a long history and the code is not written by me. I'd like to check if
> there is some reason to keep support this configuration?
>

As my observation, currently, there are no perf tools which support
'forward overwrite'.
There are only three test cases (sw-clock, task-exit, mmap-basic) which
is set to 'forward overwrite'. I donât see any reason it cannot be changed to
'backward overwrite'

Arnaldo? Jirka? Kim?

What do you think?

Thanks,
Kan