Re: clockevents: fix resume logic

From: Thomas Gleixner
Date: Wed Sep 12 2007 - 12:57:57 EST


On Wed, 2007-09-12 at 02:16 -0700, Andrew Morton wrote:
> On Tue, 11 Sep 2007 23:49:47 +0200 Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> > On Tue, 2007-09-11 at 21:52 +0200, Thomas Gleixner wrote:
> > > > C1: type[C1] promotion[C2] demotion[--] latency[001] usage[00000010] duration[00000000000000000000]
> > > > *C2: type[C2] promotion[--] demotion[C1] latency[001] usage[00008316] duration[00000000000170717293]
> > >
> > > Ok, here we are. The bad one uses C2 which stops the local apic on the
> > > VAIO. I suspect we end up in the suspend/resume with going into C2
> > > without the broadcast active.
> > >
> > > Can you try to get the output of SysRq-Q during the "it needs help from
> > > keyboard" period ?
> >
> > Summary of the oddities we are seing:
> >
> > 1.) disabling local apic timer makes the problem go away
> > 2.) disabling nohz and highres makes the problem go away
> > 3.) adding the cpuidle patches from the acpi tree makes the problem go
> > away
>
> Only the fist cpuidle patch, actually. I'd have though this was a big hint?

Yes, it prevents the cpu to enter C2, which keeps the local APIC alive.

> > The obvious conclusion is, that in all other cases we run into a state,
> > where the local apic timer is not working.
> >
> > 1) we do not use it
> > 2) it is used in periodic mode
> > 3) the cpu does not enter C2 (which turns the lapic timer off on the
> > VAIO)
> >
> > While 1) and 3) are understandable, the reason why 2) is working is a
> > mystery because the periodic mode is affected by the C state as well.
> >
> > Andrew, can you please provide the output of /proc/timer_list when you
> > boot the kernel with "nohz=off highres=off", but honestly I do not
> > expect a lot of enlightenment from it.

> Tick Device: mode: 0
> Clock Event Device: pit
> max_delta_ns: 27461866
> min_delta_ns: 12571
> mult: 5124677
> shift: 32
> mode: 2
> next_event: 9223372036854775807 nsecs
> set_next_event: pit_next_event
> set_mode: init_pit_timer
> event_handler: tick_handle_periodic_broadcast
> tick_broadcast_mask: 00000001
> tick_broadcast_oneshot_mask: 00000000

Ok, I got my brain together and read through the code. The periodic mode
switches to broadcast right away when the acpi code discovers that there
is a lapic stops C-State available.

Does the test hack below fix the problem for nohz/highres enabled
kernels ?

tglx

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -382,6 +382,8 @@ static int tick_broadcast_set_event(ktime_t expires, int force)

int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
{
+ cpu_set(smp_processor_id(), tick_broadcast_oneshot_mask);
+
clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);

if(!cpus_empty(tick_broadcast_oneshot_mask))




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/