Re: clocksource changes in 2.6.31 - possible regression

From: john stultz
Date: Mon Aug 17 2009 - 13:50:24 EST


On Mon, 2009-08-17 at 09:03 -0700, Stephen Hemminger wrote:
> The following commit causes a change for kernels built with HRT but
> not actually using HRT. I typically use the generic kernel we ship
> on test machines, and that kernel has NOHZ and HRT (for power savings/virt
> and HRT for QoS), but I want to be able to enable TSC as a clock source
> when doing performance tests with pktgen.
>
> The machine in question is a several year old Opteron box, that
> normally reports clocksources: acpi_pm jiffies tsc
> but now with 2.6.31-rc6, it only has acpi_pm.

I might need to review the patch again, but I believe we just don't
allow you to switch to non HRT compatible clocksources (like jiffies) if
we're already in HRT mode (and thus would hang when switched).


The behavior you describe where you can't switch to the TSC, may be due
to the TSC disqualification code marking it as non HRT compatible
(again, I need to double check). While I'm not sure that's really
correct, as the TSC is fine for HRT, in this case on your box, the TSC
has been marked as unstable (likely due to being unsynced on old AMD SMP
systems). There is a real chance that the timekeeping code on your
system could see the TSC go backwards, calculate a negative time
interval, and then end up hanging.

I suspect for that reason its been removed from the clocksource list.

Thomas: what's your take on this? It seems the proper fix would be to
maybe have a "go ahead, shoot yourself" boot option that disables the
TSC disqualification? Or should we not be flipping the HRT compatible
flag on the TSC clocksource on disqualification?


> Since HRT/NOHZ is not really runtime configurable, I think the
> proper behavior is:
>
> * kernel reports all possible clocksources and chooses the best
> by default
> * if user demands a different clocksource, the kernel should use that
> but degrade if necessary: ie. high-res timers have less (maybe even
> only HZ accuracy), and nohz should be automatically disabled if
> needed

Yea, the way it is is actually due to HRT/NOHZ not being runtime
configurable. Safely shutting down HRT/NOHZ is more difficult then the
transition to enabling it, so once its on, its on .

-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/