Re: Hang and Soft Lockup problems with generic time code

From: john stultz
Date: Sat Jul 08 2006 - 17:45:44 EST


On Fri, 2006-07-07 at 23:36 -0500, James Bottomley wrote:
> > Did you really mean jumps of 200 seconds? Hmmm. The issue Roman and I
> > have been looking into does occur when we lose a number of ticks and
> > that confuses the clocksource adjustment code. The fix we're working
> > on
> > corrects the adjustment confusion, but doesn't fix the lost ticks.
> >
> > However 200 seconds of lost ticks sounds very off. Could the driver be
> > disabling interrupt for such a long period of time?
>
> Well, what I was seeing was that
>
> clocksource_read(clock) - clock->cycle_last
>
> is returning a value about 200 x clock->cycle_interval

That then would be ~200 ticks. Is this at HZ=1000 ?

> According to the debugging printks I put into update_wall_time(). I was
> assuming this was caused by a jump in the TSC count, but I suppose it
> could also be cause by spurious alterations to cycle_last or other
> effects I haven't traced.

Since this issue effected both the TSC and ACPI PM timer, I'd more
likely suspect something is holding off the timer interrupt. This could
be some kernel code like a driver, or it could be something like an SMI
from the BIOS.

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/