Re: Linux 4.14-rc6: WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0

From: Peter Zijlstra
Date: Tue Oct 31 2017 - 10:57:40 EST


On Mon, Oct 30, 2017 at 11:49:54PM +0100, Fengguang Wu wrote:
> On Mon, Oct 30, 2017 at 11:02:58AM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 30, 2017 at 07:27:36AM +0100, Fengguang Wu wrote:
> >
> > > [ 189.480568] perf: interrupt took too long (5132 > 4982), lowering kernel.perf_event_max_sample_rate to 38000
> > > [ 189.690660] perf: interrupt took too long (6582 > 6415), lowering kernel.perf_event_max_sample_rate to 30000
> > > [ 189.901706] perf: interrupt took too long (8268 > 8227), lowering kernel.perf_event_max_sample_rate to 24000
> > > [ 272.841032] perfevents: irq loop stuck!
> > > [ 272.841038] ------------[ cut here ]------------
> > > [ 272.841046] WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0
> >
> > So I've not seen this in a fair while; is this new in 4.14?
>
> It looks a pretty old error. Here is the dmesg for 4.12:
>
> [ 229.514000] Test Case count_global_group_cpu/mem-loads/_cpu/cache-references/_cpu/stalled-cycles-backend/_u PASS!
> [ 229.514002]
> [ 229.519591] Test Case count_global_group_cpu/mem-loads/_cpu/cache-references/_cpu/stalled-cycles-backend/_k PASS!
> [ 229.519594]
> [ 229.521742] ROUND : perf hardware event sample group test
> [ 229.521744]
> [ 229.689807] perfevents: irq loop stuck!
> [ 229.689807] ------------[ cut here ]------------
> [ 229.689809] WARNING: CPU: 4 PID: 23149 at arch/x86/events/intel/core.c:2114 intel_pmu_handle_irq+0x4a8/0x4c0

> [ 229.689828] CPU: 4 PID: 23149 Comm: perf Not tainted 4.12.0 #1
> [ 229.689829] Hardware name: Dell Inc. Studio XPS 8000/0X231R, BIOS A01 08/11/2009

Ok, that's a NHM client if my google skillz are any good.

Is there a specific workload that makes this happen more than any other?
That is, what should I attempt to reproduce?