hrtimer and context switch overhead -- udelay to "usleep"?

From: Patrick Pannuto
Date: Mon Jun 21 2010 - 14:06:03 EST


Circa 2007 there was talk of a patch to replace the underlying msleep
Implementation with hrtimers ( http://lkml.org/lkml/2007/8/3/250 ); this
ended up not happening for several reasons, but the idea of a usleep
call built on hrtimers was at one point tossed around in the discussion.
However, it seems that usleep idea never made it past that thread.

I was wondering then about the potential overhead / cost / worth of a
usleep function. Consider the hypothetical usleep:

static int __sched do_usleep_range(unsigned long min, unsigned long max)
{
ktime_t kmin;
unsigned long delta;

kmin = ktime_set(0, min * NSEC_PER_USEC);
delta = max - min;
return schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
}

/**
* usleep_range - Drop in replacement for udelay where wakeup is flexible
* @min: Minimum time in usecs to sleep
* @max: Maximum time in usecs to sleep
*/
void usleep_range(unsigned long min, unsigned long max)
{
__set_current_state(TASK_UNINTERRUPTIBLE);
do_usleep_range(min, max);
}

EXPORT_SYMBOL(usleep_range);

static inline void usleep(unsigned long usecs)
{
usleep_range(usecs, usecs);
}


I have been doing some work in device driver writing lately, and an
unfortunately common bit of code seems to be:

write_some_bits(...)
/* Need to let hardware latch */
udelay(100)

100us is a fairly middle-ground value, there are many calls to udelay(1~5)
as well as some as high as udelay(800)! This seems like an awfully long
time to be busy waiting, particularly if another part of the kernel could
be doing something else.

My question then is would a function such as usleep that replaces some of
the longer calls to udelay be "worth it" and at what point would an
appropriate cut-off between udelay and usleep be found? The first two
questions that come to mind then are:

What is the approximate cost of a context switch, quantified?

I found 2 papers circa 2007 that seek to answer this, though they
are a bit dated:

http://www.cs.rochester.edu/u/cli/research/switch.pdf
IBM eServer, dual 2.0GHz Pentium Xeon; 512 KB L2, cache line 128B
Linux 2.6.17, RHEL 9, gcc 3.2.2 (-O0)
3.8 us / context switch

http://delivery.acm.org/10.1145/1290000/1281703/a3-david.pdf
ARMv5, ARM926EJ-S on an OMAP1610 (set to 120MHz clock)
Linux 2.6.20-rc5-omap1
48 us / context switch

What is the overhead of an hrtimer? Would it be 'bad' to start using
a lot more of them in sleeps?

From what I could tell, the overhead of hrtimers is fairly
negligible, but that is just from reading documentation,
anyone with numbers or experience here would be greatly
appreciated.


Finally, to address any potential questions of why this isn't built on
top of do_nanosleep, the function usleep_range seems very valuable for
power applications; many of the delays are simply waiting for something
to complete, thus I would prefer if they did not themselves instigate
a wake-up; also, do_nanosleep seems like it is built to be an interface
for the user-space nanosleep function - it did not seem like a good fit.


Lastly, it is worth noting that there are many calls to udelay that
*cannot* be switched to usleep; udelay is used often to implement some
degree of bitbanging that would not take kindly to the more
unpredictable usleep. This idea is *not* intended for those
applications.


I appreciate any insight!

-Pat

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/