Re: intel_pstate_timer_func divide by zero oops

From: Dirk Brandewie
Date: Thu Mar 28 2013 - 11:36:11 EST


On 03/27/2013 08:13 PM, Parag Warudkar wrote:
On Wed, Mar 27, 2013 at 10:51 PM, Dirk Brandewie
<dirk.brandewie@xxxxxxxxx> wrote:

Is there any way to capture the beginning of this trace?

I tried but since the oops scrolls fast followed by a hard freeze, I
wasn't able to capture it completely.
May be I can try netconsole and see if that helps.


pid_param_set() is on the stack which means that something is changing
the debugfs parameters or the stack is FUBAR.

I somehow doubt the stack is messed up as the call traces are always identical.
(pid_param_set() seems to be in first trace as well.)


I agree that the two oops are likely the same but unless something is crawling
through debugfs writing random values to the files there pid_param_set()
should not be on any stack anywhere.

There was a similar bug reported by fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=920289

This bug has not showed up again since rc3 can you try the current rc to see if
you still see the problem?


I don't see how duration_us can be zero unless somehow I am getting
back-to-back
timer callbacks which seems unlikely since the timer is not re-armed until
the timer function is about to return and the driver has done all its work
for the sample period

Do the two oops with common call stack suggest back to back callbacks?

I will add some debugging checks tomorrow to see what is going on. But
sounds like a minimal fix would be to guard against callbacks in quick
succession?
i.e. return from sample if ktime_us_delta(now, cpu->prev_sample) is zero?

Thanks,
Parag


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/