Re: kernel/timer.c design

From: Thomas Gleixner
Date: Wed Oct 19 2005 - 14:02:23 EST


On Wed, 2005-10-19 at 11:00 -0700, Tim Bird wrote:
> > But there's a hidden win as well from this approach: if a timer is
> > removed before it expires, we've saved the remaining cascading steps!
> > This happens surprisingly often: on a busy networked server, the
> > majority of the timers never expire, and are removed before they have to
> > be cascaded even once.
>
> Unfortunately, this means that the actual costs of the wheel
> implementation vary depending on the relationship between HZ,
> the average timeout duration, and the bucket mappings (which,
> as you say, can be adjusted for size reasons.) This is one of
> the downsides of the wheel implementation. It's very difficult
> to tell in advance whether a particular timer load
> will cascade or not, making the costs (although bounded)
> unexpectedly variable.

Thats exactly the problem we described earlier in the ktimer discussion:

Changing HZ from 100 to 1000 while keeping the primary wheel size
unchanged caused increased cascading load.

HZ CONFIG_BASE_SMALL=n CONFIG_BASE_SMALL=y
100 2560 ms 640 ms
250 1024 ms 256 ms
1000 256 ms 64 ms

A lot of timeouts are in the range of 500ms. While the HZ=100 and HZ=250
settings keep them in the primary wheel either until expiry or early
removal, HZ=1000 and CONFIG_BASE_SMALL with HZ > 100 make cascading more
likely when the system load goes up.

Thats hard to balance for sure.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/