Re: Suspend-to-ram not working when ftrace is enabled, again!

From: Srivatsa S. Bhat
Date: Tue Mar 20 2012 - 09:58:03 EST


On 03/20/2012 07:12 PM, Steven Rostedt wrote:

> On Mon, 2012-03-19 at 21:16 +0530, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> If tracing is enabled and we are tracing low-level suspend-to-ram related
>> functions like restore_processor_state() etc (which are included by default
>> in the list of traced functions), and we try suspending the machine, the
>> machine doesn't resume. It reboots instead.
>> (If we trace some unrelated functions like kzalloc() for example, there is
>> no problem with suspend/resume).
>
> Yeah, this is a know issue. I need to look at the suspend code and add
> notrace annotations, or keep entire files from being traced.
>
> The problem is that on resume, there's functions that are called that do
> not have all kernel setup initialized. For example, smp_processor_id()
> uses the %gs register to access the per_cpu data which also contains the
> cpu id. On resume, the %gs register is not yet set up, and calling the
> function tracer, which uses smp_processor_id() to find out what buffer
> to write to causes a page fault. Then the page fault handling also calls
> the function tracer which it too will page fault, and we end up with a
> triple fault and the machine reboots.
>
>


In that case, I wonder why your patch to disable tracing during suspend
was reverted at all ?! (commit cbe2f5a6e84)

>>
>> Looking at https://lkml.org/lkml/2008/8/27/177, it appears that this
>> is an old problem and also had a workaround (disabling tracing around
>> suspend). The above patch corresponds to commit id: f42ac38c59 (ftrace:
>> disable tracing for suspend to ram), which went in around 2.6.27 I think.
>> But then commit cbe2f5a6e84 (tracing: allow tracing of suspend/resume &
>> hibernation code again) reverted that commit.
>>
>> And from https://lkml.org/lkml/2008/8/21/349, it looks like 2.6.28 and
>> further was supposed to be problem-free. But unfortunately this problem has
>> resurfaced.
>>
>> I tested kernel 2.6.32.54 and I observed that the machine reboots during
>> resume, which looks exactly like the problem discussed in the link above.
>>
>> In another machine, I tested 3.3-rc6 and it doesn't seem to respond to
>> resume events (like button press, lid open) at all. It just seems to remain
>> suspended forever.
>>
>> Should we resort to disabling ftrace around suspend again? Or do we have a
>> better solution this time around?
>>
>
> No the real solution is to find the functions that break and fix them.
> Probably requires more notrace annotations.
>


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/