Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context

From: Sasha Levin
Date: Fri Jan 30 2015 - 14:59:07 EST


On 01/28/2015 04:02 PM, Andy Lutomirski wrote:
> On Wed, Jan 28, 2015 at 9:48 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, Jan 28, 2015 at 08:33:06AM -0800, Andy Lutomirski wrote:
>>> On Fri, Jan 23, 2015 at 5:25 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Fri, Jan 23, 2015 at 12:48 PM, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>>>>> On 01/23/2015 01:34 PM, Andy Lutomirski wrote:
>>>>>> On Fri, Jan 23, 2015 at 10:04 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
>>>>>>> On Fri, Jan 23, 2015 at 09:58:01AM -0800, Andy Lutomirski wrote:
>>>>>>>>> [ 543.999079] Call Trace:
>>>>>>>>> [ 543.999079] dump_stack (lib/dump_stack.c:52)
>>>>>>>>> [ 543.999079] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4259)
>>>>>>>>> [ 543.999079] atomic_notifier_call_chain (include/linux/rcupdate.h:892 kernel/notifier.c:182 kernel/notifier.c:193)
>>>>>>>>> [ 543.999079] ? atomic_notifier_call_chain (kernel/notifier.c:192)
>>>>>>>>> [ 543.999079] notify_die (kernel/notifier.c:538)
>>>>>>>>> [ 543.999079] ? atomic_notifier_call_chain (kernel/notifier.c:538)
>>>>>>>>> [ 543.999079] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
>>>>>>>>> [ 543.999079] do_debug (arch/x86/kernel/traps.c:652)
>>>>>>>>> [ 543.999079] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
>>>>>>>>> [ 543.999079] ? do_int3 (arch/x86/kernel/traps.c:610)
>>>>>>>>> [ 543.999079] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
>>>>>>>>> [ 543.999079] debug (arch/x86/kernel/entry_64.S:1310)
>>>>>>>>
>>>>>>>> I don't know how to read this stack trace. Are we in do_int3,
>>>>>>>> do_debug, or both? I didn't change do_debug at all.
>>>>>>>
>>>>>>> It looks like we're in do_debug. do_int3 is only on the stack but not
>>>>>>> part of the current frame if I can trust the '?' ...
>>>>>>>
>>>>>>
>>>>>> It's possible that an int3 happened and I did something wrong on
>>>>>> return that caused a subsequent do_debug to screw up, but I don't see
>>>>>> how my patch would have caused that.
>>>>>>
>>>>>> Were there any earlier log messages?
>>>>>
>>>>> Nope, nothing odd before or after.
>>>>
>>>> Trinity just survived for a decent amount of time for me with my
>>>> patches, other than a bunch of apparently expected OOM kills. I have
>>>> no idea how to tell trinity how much memory to use.
>>>
>>> A longer trinity run on a larger VM survived (still with some OOM
>>> kills, but no taint) with these patches. I suspect that it's a
>>> regression somewhere else in the RCU changes. I have
>>> CONFIG_PROVE_RCU=y, so I should have seen the failure if it was there,
>>> I think.
>>
>> If by "RCU changes" you mean my changes to the RCU infrastructure, I am
>> going to need more of a hint than I see in this thread thus far. ;-)
>>
>
> I can't help much, since I can't reproduce the problem. Presumably if
> it's a bug in -tip, someone else will trigger it, too.

I'm not sure what to tell you here, I'm not using any weird options for trinity
to reproduce it.

It doesn't happen to frequently, but I still see it happening.

Would you like me to try a debug patch or something similar?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/