Re: In many cases softlockup can not be reported after disablingIRQ for long time

From: Russell King - ARM Linux
Date: Sat Feb 04 2012 - 07:23:17 EST


On Thu, Feb 02, 2012 at 10:05:22PM +0800, TAO HU wrote:
> I don't know it's already been discussed.
> Appreciate if you could point out existing discussion thread.
>
> I agree it is impossible to detect "timeout" when using jiffies which
> relies on timer.
>
> For timestamp, softlockup (watchdog) use cpu_clock() whcih eventually calls
> sched_clock().
> And sched_clock() is implemented to read out the value of a 32K
> timer/counter on OMAP4430.
> That means the timestamp will be still updated while the IRQ is disabled.

Yes, and it'll take 131072 seconds to wrap.

> So when IRQ is re-enabled, softlockup code will be able to read a "fresh"
> timestamp which can be used to
> detect the timeout.
>
>
> static unsigned long get_timestamp(int this_cpu)
> {
> return cpu_clock(this_cpu) >> 30LL; /* 2^30 ~= 10^9 */
> }
>
> unsigned long long __attribute__((weak)) sched_clock(void)
> {
> return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> * (NSEC_PER_SEC / HZ);
> }
>
> #ifndef CONFIG_OMAP_MPU_TIMER
> unsigned long long notrace sched_clock(void)
> {
> return _omap_32k_sched_clock();
> }
> #else
> unsigned long long notrace omap_32k_sched_clock(void)
> {
> return _omap_32k_sched_clock();
> }
> #endif

I guess someone needs to do some tracing to see what's going on, and
get a feel for the order in which things happen. (Or add some printks.)

Is there a ready-prepared bit of code I can try?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/