Re: [PATCH 1/2] perf: Add munmap callback

From: Stephane Eranian
Date: Tue Nov 06 2018 - 10:00:30 EST


On Mon, Nov 5, 2018 at 7:43 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 11/5/2018 5:59 AM, Stephane Eranian wrote:
> > Hi Kan,
> >
> > I built a small test case for you to demonstrate the issue for code and data.
> > Compile the test program and then do:
> > For text:
> > $ perf record ./mmap
> > $ perf report -D | fgrep MMAP2
> >
> > The test program mmaps 2 pages, unmaps the second, and remap 1 page
> > over the freed space.
> > If you look at the MMAP2 record, you will not be able to reconstruct
> > what happened and perf will
> > get confused should it try to symbolize from the address range
> >
> > With Text:
> > PERF_RECORD_MMAP2 5937/5937: [0x400000(0x1000) @ 0 08:01 400938
> > 824817672]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> >
> > ^^^^^^^^^^^^^^^^^^^^^^^^ captures the whole VMA but not the mapping
> > change in user space
> >
> > For data:
> > $ perf record -d ./mmap
> > $ perf report -D | fgrep MMAP2
> > With data:
> > PERF_RECORD_MMAP2 6430/6430: [0x400000(0x1000) @ 0 08:01 400938
> > 3278843184]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> >
> > Same test case with data.
> > Perf will think the entire 2 pages have been replaced when in fact
> > only the second has.
> > I believe the problem is likely to impact data and jitted code cache
> >
> > #include <sys/types.h>
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > #include <err.h>
> > #include <getopt.h>
> >
> > int main(int argc, char **argv)
> > {
> > void *addr1, *addr2;
> > size_t pgsz = sysconf(_SC_PAGESIZE);
> > int n = 2;
> > int ret;
> > int c, mode = 0;
> >
> > while ((c = getopt(argc, argv, "hd")) != -1) {
> > switch (c) {
> > case 'h':
> > printf("[-h]\tget this help\n");
> > printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
> > return 0;
> > case 'd':
> > mode = PROT_EXEC;
> > break;
> > default:
> > errx(1, "unknown option");
> > }
> > }
> > /* default to data */
> > if (mode == 0)
> > mode = PROT_WRITE;
> >
> > /*
> > * mmap 2 contiugous pages
> > */
> > addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> > if (addr1 == (void *)MAP_FAILED)
> > err(1, "mmap 1 failed");
> >
> > printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);
> >
> > /*
> > * unmap only the second page
> > */
> > ret = munmap(addr1 + pgsz, pgsz);
> > if (ret == -1)
> > err(1, "munmp failed");
> >
> > /*
> > * mmap 1 page at the location of the unmap page (should reuse virtual space)
> > * This creates a continuous region built from two mmaps and
> > potentially two different sources
> > * especially with jitted runtimes
> > */
>
> The two mmaps are both anon. As my understanding, we cannot symbolize
> from the anonymous address, can we?
Can't we build the same test case using an actual file mapping (both
mmap from the same file)?

> If we cannot, why we have to distinguish with them? I think we do not
> need to know their sources for symbolization.
>
> As my understanding, only --jit can inject MMAP event, which tag an
> anon. Perf can symbolize the address after that. Then the unmap is needed.
>
Yes, perf inject --jit injects timestamped MMAP2 records to cover the
jitted regions
which helps symbolize anons.

> Thanks,
> Kan
> > addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
> > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> >
> > printf("addr2=%p\n", addr2);
> >
> > if (addr2 == (void *)MAP_FAILED)
> > err(1, "mmap 2 failed");
> > if (addr2 != (addr1 + pgsz))
> > errx(1, "wrong mmap2 address");
> >
> > sleep(1);
> >
> > return 0;
> > }
> >
> > On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> >>> The need for this new record type extends beyond physical address conversions
> >>> and PEBS. A long while ago, someone reported issues with symbolization related
> >>> to perf lacking munmap tracking. It had to do with vma merging. I think the
> >>> sequence of mmaps was as follows in the problematic case:
> >>> 1. addr1 = mmap(8192);
> >>> 2. munmap(addr1 + 4096, 4096)
> >>> 3. addr2 = mmap(addr1+4096, 4096)
> >>>
> >>> If successful, that yields addr2 = addr1 + 4096 (could also get the
> >>> same without forcing the address).
> >>>
> >>> In that case, if I recall correctly, the vma for 1st mapping (now at
> >>> 4k) and that of the 2nd mapping (4k)
> >>> get merged into a single 8k vma and this is what perf_events will
> >>> record for PERF_RECORD_MMAP.
> >>> On the perf tool side, it is assumed that if two timestamped mappings
> >>> overlap then, the latter overrides
> >>> the former. In this case, perf would loose the mapping of the first
> >>> 4kb and assume all symbols comes from
> >>> 2nd mapping. Hopefully I got the scenario right. If so, then you'd
> >>> need PERF_RECORD_UNMAP to
> >>> disambiguate assuming the perf tool is modified accordingly.
> >>>
> >>
> >> Hi Stephane and Peter,
> >>
> >> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
> >> to understand the problematic case.
> >>
> >> It looks like the issue can only be triggered by perf inject --jit.
> >> Because it can inject extra MMAP events.
> >> As my understanding, Linux kernel only try to merge VMAs if they are
> >> both from anon or they are both from the same file. --jit breaks the
> >> rule, and makes the merged VMA partly from anon, partly from file.
> >> Now, there is a new MMAP event which range covers the modified VMA.
> >> Without the help of MUNMAP event, perf tool have no idea if the new one
> >> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
> >> Current code just simply overwrite the modified VMAs. The VMA
> >> information which --jit injected may be lost. The symbolization may be
> >> lost as well.
> >>
> >> Except --jit, the VMAs information should be consistent between kernel
> >> and perf tools. We shouldn't observe the problem. MUNMAP event is not
> >> needed.
> >>
> >> Is my understanding correct?
> >>
> >> Do you have a test case for the problem?
> >>
> >> Thanks,
> >> Kan