Re: [PATCH 1/2] time: logrithmic time accumulation

From: John Kacur
Date: Mon Oct 05 2009 - 06:46:22 EST

Next message: K.Prasad: "Re: [PATCH -tip] x86: hw-breakpoints: Actually flush threadbreakpoints in flush_thread()."
Previous message: Ryo Tsuruta: "Re: IO scheduler based IO controller V10"
In reply to: John Kacur: "Re: [PATCH 2/2] time: remove xtime_cache"
Next in thread: Ingo Molnar: "Re: [PATCH 1/2] time: logrithmic time accumulation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2 Oct 2009, john stultz wrote:

> Accumulating one tick at a time works well unless we're using NOHZ. Then
> it can be an issue, since we may have to run through the loop a few
> thousand times, which can increase timer interrupt caused latency.
>
> The current solution was to accumulate in half-second intervals with
> NOHZ. This kept the number of loops down, however it did slightly change
> how we make NTP adjustments. While not an issue with NTPd users, as NTPd
> makes adjustments over a longer period of time, other adjtimex() users
> have noticed the half-second granularity with which we can apply
> frequency changes to the clock.
>
> For instance, if a application tries to apply a 100ppm frequency
> correction for 20ms to correct a 2us offset, with NOHZ they either get
> no correction, or a 50us correction.
>
> Now, there will always be some granularity error for applying frequency
> corrections. However with users sensitive to this error have seen a
> 50-500x increase with NOHZ compared to running without NOHZ.
>
> So I figured I'd try another approach then just simply increasing the
> interval. My approach is to consume the time interval logarithmically.
> This reduces the number of times through the loop needed keeping
> latency down, while still preserving the original granularity error for
> adjtimex() changes.
>
> Further, this change allows us to remove the xtime_cache code (patch to
> follow), as xtime is always within one tick of the current time, instead
> of the half-second updates it saw before.
>
> An earlier version of this patch has been shipping to x86 users in the
> RedHat MRG releases for awhile without issue, but I've reworked this
> version to be even more careful about avoiding possible overflows if the
> shift value gets too large.
>
> Since this is not the most trivial code, and its slightly different then
> whats been tested for awhile, it would be good to get this into some
> trees for testing. Be it -tip or -mm, either would work. If there's no
> problems it could be a 2.6.33 or 2.6.34 item.
>
> Any comments or feedback would be appreciated!
>
> Signed-off-by: John Stultz <johnstul@xxxxxxxxxx>
>
> diff --git a/include/linux/timex.h b/include/linux/timex.h
> index e6967d1..0c0ef7d 100644
> --- a/include/linux/timex.h
> +++ b/include/linux/timex.h
> @@ -261,11 +261,7 @@ static inline int ntp_synced(void)
>
> #define NTP_SCALE_SHIFT 32
>
> -#ifdef CONFIG_NO_HZ
> -#define NTP_INTERVAL_FREQ (2)
> -#else
> #define NTP_INTERVAL_FREQ (HZ)
> -#endif
> #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
>
> /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index fb0f46f..4cc5656 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -721,6 +721,51 @@ static void timekeeping_adjust(s64 offset)
> timekeeper.ntp_error_shift;
> }
>
> +
> +/**
> + * logarithmic_accumulation - shifted accumulation of cycles
> + *
> + * This functions accumulates a shifted interval of cycles into
> + * into a shifted interval nanoseconds. Allows for O(log) accumulation
> + * loop.
> + *
> + * Returns the unconsumed cycles.
> + */
> +static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> +{
> + u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> +
> + /* If the offset is smaller then a shifted interval, do nothing */
> + if (offset < timekeeper.cycle_interval<<shift)
> + return offset;
> +
> + /* accumulate one shifted interval */
> + offset -= timekeeper.cycle_interval << shift;
> + timekeeper.clock->cycle_last += timekeeper.cycle_interval << shift;
> +
> + timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
> + while (timekeeper.xtime_nsec >= nsecps) {
> + timekeeper.xtime_nsec -= nsecps;
> + xtime.tv_sec++;
> + second_overflow();
> + }
> +
> + /* accumulate into raw time */
> + raw_time.tv_nsec += timekeeper.raw_interval << shift;;
> + while (raw_time.tv_nsec >= NSEC_PER_SEC) {
> + raw_time.tv_nsec -= NSEC_PER_SEC;
> + raw_time.tv_sec++;
> + }
> +
> + /* accumulate error between NTP and clock interval */
> + timekeeper.ntp_error += tick_length << shift;
> + timekeeper.ntp_error -= timekeeper.xtime_interval <<
> + (timekeeper.ntp_error_shift + shift);
> +
> + return offset;
> +}
> +
> +
> /**
> * update_wall_time - Uses the current clocksource to increment the wall time
> *
> @@ -731,6 +776,7 @@ void update_wall_time(void)
> struct clocksource *clock;
> cycle_t offset;
> u64 nsecs;
> + int shift = 0, maxshift;
>
> /* Make sure we're fully resumed: */
> if (unlikely(timekeeping_suspended))
> @@ -744,33 +790,22 @@ void update_wall_time(void)
> #endif
> timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
>
> - /* normally this loop will run just once, however in the
> - * case of lost or late ticks, it will accumulate correctly.
> + /*
> + * With NO_HZ we may have to accumulate many cycle_intervals
> + * (think "ticks") worth of time at once. To do this efficiently,
> + * we calculate the largest doubling multiple of cycle_intervals
> + * that is smaller then the offset. We then accumulate that
> + * chunk in one go, and then try to consume the next smaller
> + * doubled multiple.
> */
> + shift = ilog2(offset) - ilog2(timekeeper.cycle_interval);
> + shift = max(0, shift);
> + /* Bound shift to one less then what overflows tick_length */
> + maxshift = (8*sizeof(tick_length) - (ilog2(tick_length)+1)) - 1;
> + shift = min(shift, maxshift);
> while (offset >= timekeeper.cycle_interval) {
> - u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> -
> - /* accumulate one interval */
> - offset -= timekeeper.cycle_interval;
> - clock->cycle_last += timekeeper.cycle_interval;
> -
> - timekeeper.xtime_nsec += timekeeper.xtime_interval;
> - if (timekeeper.xtime_nsec >= nsecps) {
> - timekeeper.xtime_nsec -= nsecps;
> - xtime.tv_sec++;
> - second_overflow();
> - }
> -
> - raw_time.tv_nsec += timekeeper.raw_interval;
> - if (raw_time.tv_nsec >= NSEC_PER_SEC) {
> - raw_time.tv_nsec -= NSEC_PER_SEC;
> - raw_time.tv_sec++;
> - }
> -
> - /* accumulate error between NTP and clock interval */
> - timekeeper.ntp_error += tick_length;
> - timekeeper.ntp_error -= timekeeper.xtime_interval <<
> - timekeeper.ntp_error_shift;
> + offset = logarithmic_accumulation(offset, shift);
> + shift--;
> }
>
> /* correct the clock when NTP error is too big */
>
>
>

There are several (6) trailing whitespace errors that checkpatch exposes,
but other than that:
Signed-off-by: John Kacur <jkacur@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: K.Prasad: "Re: [PATCH -tip] x86: hw-breakpoints: Actually flush threadbreakpoints in flush_thread()."
Previous message: Ryo Tsuruta: "Re: IO scheduler based IO controller V10"
In reply to: John Kacur: "Re: [PATCH 2/2] time: remove xtime_cache"
Next in thread: Ingo Molnar: "Re: [PATCH 1/2] time: logrithmic time accumulation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]