Re: perfevents: irq loop stuck!

From: Peter Zijlstra
Date: Fri May 16 2014 - 03:45:00 EST


On Fri, May 16, 2014 at 12:25:28AM -0400, Vince Weaver wrote:
> anyway I'm not sure if it's worth tracking this more if it's possible to
> mostly fix the case by fixing the sample_period bounds.

Right, so lets start with that, if it triggers again, we'll have another
look.

FWIW I ran with the below patch over night, and while trinity completely
shat itself going OOM -- so I'm not sure how long it ran, it didn't
trigger the stuck interrupt loop.

Will do more runs now that I'm there to hold its hand once more.

---
Subject: perf: Limit perf_event_attr::sample_period to 63 bits
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu May 15 20:23:48 CEST 2014

Vince reported that using a large sample_period (one with bit 63 set)
results in wreckage since while the sample_period is fundamentally
unsigned (negative periods don't make sense) the way we implement
things very much rely on signed logic.

So limit sample_period to 63 bits to avoid tripping over this.

Reported-by: Vince Weaver <vincent.weaver@xxxxxxxxx>
Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@xxxxxxxxxxxxxx
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7058,6 +7058,9 @@ SYSCALL_DEFINE5(perf_event_open,
if (attr.freq) {
if (attr.sample_freq > sysctl_perf_event_sample_rate)
return -EINVAL;
+ } else {
+ if (attr.sample_period & (1ULL << 63))
+ return -EINVAL;
}

/*

Attachment: pgpNgcY7kGbqC.pgp
Description: PGP signature