Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5

From: Ingo Molnar
Date: Tue Oct 18 2005 - 03:48:53 EST

Next message: Arjan van de Ven: "Re: [PATCH] disable PREEMPT_BKL per default"
Previous message: Carsten Otte: "Re: [Patch 2/3] Export get_one_pte_map."
In reply to: Roman Zippel: "Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5"
Next in thread: Tim Bird: "Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:

> On Mon, 17 Oct 2005, Ingo Molnar wrote:
>
> > why you insist on ktimers being 'process timers'?
>
> Because they are optimized for process usage. OTOH kernel usage is
> more than just "timeouts".

you have cut out the rest of what i write in the paragraph, which IMO
answers your question:

> > They are totally separate entities, not limited to any process
> > notion. One of their first practical use happens to be POSIX process
> > timers (both itimers and ptimers) via them, but no way are ktimers
> > only 'process timers'. They are very generic timers, usable for any
> > kernel purpose.

so i can only repeat that ktimers is a generic timer subsystem, with a
focus on _actually delivering a timer event_.

and no, ktimers are not "optimized for process usage" (or tied to
whatever other process notion, as i said before), they are optimized
for:

- the delivery of time related events

as contrasted to the timeout-API (a'ka "timer wheel") code in
kernel/timers.c that is optimized towards:

- the fast adding/removal of timers

without too much focus on robust and deterministic delivery of events.

these two concepts are conflicting, and i claim that a (sane) data
structure that maximally fulfills both sets of requirements does not
exist, mathematically. (to repeat, the requirements are: 'fast
add/remove' and 'fast+deterministic expiry')

at this point i'd really suggest for readers to lean back and think
about the mathematical foundations of timer data structures for a bit,
with a focus on the tradeoffs that the timer wheel data structure has,
vs. the tradeoffs of the rbtree data structure that ktimers has.

My claim is that if you _know_ that a timer will expire most likely, you
want it to order at insertion time - i.e. you want to have a tree
structure. If you _know_ that a timer will most likely _not_ expire,
then you can avoid the tree overhead by 'delaying' the decision of
sorting timers, to the point in the future where we really are forced to
do so.

The result of this mathematical paradox is that we end up with two data
structures: one is the timer wheel (kernel/timers.c) for
timeout/exception related use; the other one is ktimers
(kernel/ktimers.c), for expiry oriented use.

> > so to answer your question: it is totally possible for a watchdog
> > mechanism to use ktimers. In fact it would be desirable from a
> > robustness POV too:
>
> "possible" and "desirable" is still different from "preferable", as
> they involve a higher cost.

[ in my answer above you are free to substitute "preferable" with
"desirable" - i do mean it as it reads in plain English. ]

> > e.g. we dont want a watchdog from being
> > overload-able via too many timeouts in the timer wheel ...
>
> Please explain.

e.g. on busy networked servers (i.e. ones that do have a need for
watchdogs) the timer wheel often includes large numbers of timeouts,
99.9% of which never expire. If they do expire en masse for whatever
reason, then we can get into overload mode: a million timers might have
to expire before we get to process the watchdog event and act upon it.
This can delay the watchdog event significantly, which delay might (or
might not) matter to the watchdog application.

in short: the timer wheel was not designed with determinism in mind (nor
should 'simple timeouts' care about determinism). Watchdogs are
preferably (and desirably) implemented via the most deterministic timer
mechanism that the kernel offers: ktimers in this particular case.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Arjan van de Ven: "Re: [PATCH] disable PREEMPT_BKL per default"
Previous message: Carsten Otte: "Re: [Patch 2/3] Export get_one_pte_map."
In reply to: Roman Zippel: "Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5"
Next in thread: Tim Bird: "Re: [PATCH] ktimers subsystem 2.6.14-rc2-kt5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]