Re: 7.52 second kernel compile

From: Cort Dougan (cort@fsmlabs.com)
Date: Mon Mar 18 2002 - 17:36:37 EST


Here's the modified for PPC version and the results.

The cycle timer in this case is about 16.6MHz.

# ./foo
  92.01: 1
   7.98: 2
# ./foo
   3.71: 0
  92.30: 1
   3.99: 2
# ./foo
  92.01: 1
   7.97: 2
# ./foo
  92.01: 1
   7.97: 2
# ./foo
   3.71: 0
  92.30: 1
   3.99: 2
# ./foo
   3.71: 0
  92.30: 1
   3.99: 2

#include <stdlib.h>

#if defined(__powerpc__)
#define rdtsc(low) \
   __asm__ __volatile__ ("mftb %0": "=r" (low))
#else
#define rdtsc(low) \
  __asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
#endif

#define MAXTIMES 1000
#define BUFSIZE (128*1024*1024)
#define access(x) (*(volatile unsigned int *)&(x))

int main()
{
        unsigned int i, j;
        static int times[MAXTIMES];
        char *buffer = malloc(BUFSIZE);

        for (i = 0; i < BUFSIZE; i += 4096)
                access(buffer[i]);
        for (i = 0; i < MAXTIMES; i++)
                times[i] = 0;
        for (j = 0; j < 100; j++) {
                for (i = 0; i < BUFSIZE ; i+= 4096) {
                        unsigned long start, end;

                        rdtsc(start);
                        access(buffer[i]);
                        rdtsc(end);
                        end -= start;
                        if (end >= MAXTIMES)
                                end = MAXTIMES-1;
                        times[end]++;
                }
        }
        for (i = 0; i < MAXTIMES; i++) {
                int count = times[i];
                double percent = (double)count / (BUFSIZE/4096);
                if (percent < 1)
                        continue;
                printf("%7.2f: %d\n", percent, i);
        }
        return v0;
}

} Btw, here's a program that does a simple histogram of TLB miss cost, and
} shows the interesting pattern on intel I was talking about: every 8th miss
} is most costly, apparently because Intel pre-fetches 8 TLB entries at a
} time.
}
} So on a PII core, you'll see something like
}
} 87.50: 36
} 12.39: 40
}
} ie 87.5% (exactly 7/8) of the TLB misses take 36 cycles, while 12.4% (ie
} 1/8) takes 40 cycles (and I assuem that the extra 4 cycles is due to
} actually loading the thing from the data cache).
}
} Yeah, my program might be buggy, so take the numbers with a pinch of salt.
} But it's interesting to see how on an athlon the numbers are
}
} 3.17: 59
} 34.94: 62
} 4.71: 85
} 54.83: 88
}
} ie roughly 60% take 85-90 cycles, and 40% take ~60 cycles. I don't know
} where that pattern would come from..
}
} What are the ppc numbers like (after modifying the rdtsc implementation,
} of course)? I suspect you'll get a less clear distribution depending on
} whether the hash lookup ends up hitting in the primary or secondary hash,
} and where in the list it hits, but..
}
} Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 23 2002 - 22:00:16 EST