Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec

From: Jiri Olsa
Date: Tue Feb 23 2016 - 16:44:47 EST


On Tue, Feb 23, 2016 at 04:48:49PM +0100, Jiri Olsa wrote:
> On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> > > Oleg reported that enable_on_exec results in weird scale factors.
> > >
> > > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> > > scheduling") caused this by moving task_ctx_sched_out() from before
> > > __perf_event_mask_enable() to after it.
> > >
> > > The overlooked concequence of that change is that task_ctx_sched_out()
> > > would update the ctx time fields, and now __perf_event_mask_enable()
> > > uses stale time.
> > >
> > > Fix this by adding an explicit time update.
> > >
> > > While looking at this, I also found that we need an ctx->is_active
> > > check in perf_install_in_context().
> > >
> > > XXX: does this actually fix the reported issue? I'm not sure what the
> > > reproduction case is. Also an earlier version made Jiri's machine
> > > explode -- something I've not managed to reproduce either.
> >
> > Jiri, can you have a look at this and perhaps share the reproducer?
>
> yep, I'm testing this patchset, but got stuck with 'crash' tool to get
> some reasonable output.. got stuck on unrelated sched deadlock ;-)
>
> the reproducer is described in this email:
> http://marc.info/?l=linux-kernel&m=145568006709552&w=2

so I finally got some reasonable backtrace and figured that crash finally:

#7 [ffff8802751afcd0] general_protection at ffffffff817a69e8
[exception RIP: special_mapping_fault+47]
RIP: ffffffff811e40df RSP: ffff8802751afd88 RFLAGS: 00010282
RAX: ffff8802747e8b68 RBX: 00007fffffffe080 RCX: c4712d0070657267
RDX: ffff8802751afd98 RSI: ffff8802742c4f00 RDI: ffff8802747e8b68
RBP: ffff8802751afd88 R8: 0000000000000000 R9: ffff8802751afe58
R10: 00000000000001fe R11: 00003fffffe00000 R12: ffff8802742c4f00
R13: ffff8802751afe58 R14: 0000000000000000 R15: ffff880273f59ff8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#8 [ffff8802751afd90] __do_fault at ffffffff811db505
#9 [ffff8802751afdf8] handle_mm_fault at ffffffff811e0b03
#10 [ffff8802751afec8] __do_page_fault at ffffffff8106734a
#11 [ffff8802751aff20] do_page_fault at ffffffff810675df
#12 [ffff8802751aff50] page_fault at ffffffff817a6a48


it was caused by:
- f872f5400cc0 mm: Add a vm_special_mapping.fault() method
that added call of vm_special_mapping::fault if it's defined

- and uprobes code not initializing this fault pointer properly,
attached patch fixed the issue for me,
Oleg, I'm not sure this is how you want to fix this though..


however I still see the off by 1 as Pratyush said:
65536;;probe_exact:f_65535x;132185462;100.00

I have another patch making the ena/run times equal for software
events in read syscal, and that obviously works.. but I'm not sure
how fix this otherwise ATM

thanks,
jirka


---
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0167679182c0..0c045aad28a2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1169,7 +1169,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
uprobe_opcode_t insn = UPROBE_SWBP_INSN;
struct xol_area *area;

- area = kmalloc(sizeof(*area), GFP_KERNEL);
+ area = kzalloc(sizeof(*area), GFP_KERNEL);
if (unlikely(!area))
goto out;