Re: further..

Andrew Tridgell (
Wed, 31 Jul 1996 23:40:08 +1000

> Also, has anyone ever managed to compile a call tracer into the
> kernel? i.e. something like the equivilant of gprof()? Doesn't look
> too hard as far as I can see, but I don't know how gcc -p would handle
> in-lined assembly (and vice versa).

I hacked up something along these lines for sparclinux. I used it for
optimising the sun4c code recently.

It works like this:

#define KGPROF_DEPTH 3 /* this needs to match the code below */
#define KGPROF_SIZE 100
static struct {
unsigned addr[KGPROF_DEPTH];
unsigned count;
} kgprof_counters[KGPROF_SIZE];

/* just call this function from whatever function you think needs it then
look at /proc/cpuinfo to see where the function is being called from
and how often. This gives a type of "kernel gprof" */
#define NEXT_PROF(prev,lvl) (prev>PAGE_OFFSET?__builtin_return_address(lvl):0)
static inline void kgprof_profile(void)
unsigned ret[KGPROF_DEPTH];
int i,j;
/* you can't use a variable argument to __builtin_return_address() */
ret[0] = (unsigned)__builtin_return_address(0);
ret[1] = (unsigned)NEXT_PROF(ret[0],1);
ret[2] = (unsigned)NEXT_PROF(ret[1],2);

for (i=0;i<KGPROF_SIZE && kgprof_counters[i].addr[0];i++) {
for (j=0;j<KGPROF_DEPTH;j++)
if (ret[j] != kgprof_counters[i].addr[j]) break;
if (j==KGPROF_DEPTH) break;
if (i<KGPROF_SIZE) {
for (j=0;j<KGPROF_DEPTH;j++)
kgprof_counters[i].addr[j] = ret[j];

I then added code in the sun4c /proc/cpuinfo dump routine to dump the
non-zero elements of kgprof_counters. You could also use gdb on the
live kernel to examine it if you want.

To use the above code just set the depth of the traceback you want and
add a call to kgprof_profile() in the function you want to know

This isn't a "general" trace generator as you need to target a
specific routine, but its useful when you know what routine is
expensive, but you don't know who is calling it. Eddie and me used
this to find out where all the cache flushes were coming from on the
sun4c, and we then found some ways to reduce the cache flush costs

I've only tried the above on sparc, but I believe
__builtin_return_address() is available on intel boxes.

Cheers, Andrew