Re: [BUG] Stack overflow when running perf and function tracer

From: Thomas Gleixner
Date: Fri Oct 30 2020 - 06:26:05 EST


On Fri, Oct 30 2020 at 10:00, Peter Zijlstra wrote:
> On Fri, Oct 30, 2020 at 12:27:22AM -0400, Steven Rostedt wrote:
>> I found a bug in the recursion protection that prevented function
>> tracing from running in NMI context. Applying this fix to 5.9 worked
>> fine (tested by running perf record and function tracing at the same
>> time). But when I applied the patch to 5.10-rc1, it blew up with a
>> stack overflow:
>
> So we just blew away our NMI stack, right?

Looks like that:

>> RSP: 0018:fffffe000003c000 EFLAGS: 00010046

Clearly a page boundary.

>> RAX: 000000000000001c RBX: ffff928ada27b400 RCX: 0000000000000000
>> RDX: ffff928ada07b200 RSI: fffffe000003c028 RDI: ffff928ada27b400
>> RBP: ffff928ada27b4f0 R08: 0000000000000001 R09: 0000000000000000
>> R10: fffffe000003c440 R11: ffff928a7383cc60 R12: fffffe000003c028
>> R13: 00000000000003e8 R14: 0000000000000046 R15: 0000000000110001
>> FS: 00007f25d43cf780(0000) GS:ffff928adaa40000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: fffffe000003bff8 CR3: 00000000b52a8005 CR4: 00000000001707e0

and CR2 says it tried below.

>> I bisected it down to:
>>
>> 35d1ce6bec133679ff16325d335217f108b84871 ("perf/x86/intel/ds: Fix
>> x86_pmu_stop warning for large PEBS")
>>
>> Which looks to be storing an awful lot on the stack:
>>
>> static void __intel_pmu_pebs_event(struct perf_event *event,
>> struct pt_regs *iregs,
>> void *base, void *top,
>> int bit, int count,
>> void (*setup_sample)(struct perf_event *,
>> struct pt_regs *,
>> void *,
>> struct perf_sample_data *,
>> struct pt_regs *))
>> {
>> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> struct hw_perf_event *hwc = &event->hw;
>> struct perf_sample_data data;
>> struct x86_perf_regs perf_regs;
>> struct pt_regs *regs = &perf_regs.regs;
>> void *at = get_next_pebs_record_by_bit(base, top, bit);
>> struct pt_regs dummy_iregs;
>
> The only thing I can come up with in a hurry is that that dummy_iregs
> thing really should be static. That's 168 bytes of stack out the window
> right there.

What's worse is perf_sample_data which is 384 bytes and is 64 bytes aligned.

> Still, this seems to suggest (barring some actual issue hidding in those
> 135 lost lines, we're very close to the limit on the NMI stack, which is
> a single 4k page IIRC.

Yes, unless KASAN is enabled

Thanks,

tglx