Re: Linux 2.4 VS 2.6 fork VS thread creation time test

From: Gergely Czuczy
Date: Sun May 23 2004 - 04:03:58 EST


On Sun, 23 May 2004, Nick Piggin wrote:

> Gergely Czuczy wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hello everyone,
> >
> > Today morning I've made a test about the thead and child process
> > creation(fork) time on both 2.4 and 2.6 kernels.
> >
> > The test systems
> > ================
> >
> > Box A:
> > - kernel: Linux 2.4.22-xfs
> > - CPU: Intel P3-700 MHz (slot1)
> > - Ram: 384MB SDR SDRAM
> >
> > Box B:
> > - kernel: Linux 2.6.6-mm5
> > - CPU: Intel P4-2.6Ghz
> > - Ram: 512 DDR SDRAM
> >
> > Box B is a lot better, but it doesn't metter according to the test,
> > because the aim was to get the ratio of the two times with different
> > sample rates.
> >
>
> Even so, you really should be using the same system if you want
> comparable results. The pentium 4 in particular can do worse at
> specific things in microbenchmarks.
It doesn't matter. That's why you should look at the "ratio" at the end
and not on the pure numbers, they are just bonus "information"

>
> Your problem with not being able large numbers of threads and
ohh wait, i don't have problems, i've just noticed the things!

> processes could be ulimits or sysctl limits. Look at ulimit
> and /proc/sys/kernel/threads-max and pid_max.
threads-max is about 8179. this is why I don't see why it is limited to
255 threads.


>
> [ snip numbers ]
>
> > Conlusion
> > =========
> >
> > It's easy to notice that in case of 2.4 the ratios of the creation times
> > are converges to 1, so it depends on the load, while in case of a 2.6
> > kernel the ratios are mostly fix, about 9. This means that creating a new
> > child process takes much more time than creating a new thread.
> >
>
> I tried your code on a P3-1000. 2.4.23 is not 100% comparable because it
> was built with an old compiler and possibly different options.
>
> Anyway, results with 256 samples were as follows:
> processes 256
> 2.4.23:
> Total fork time: 48.224 msecs
> Total thread time: 30.180 msecs
> Total fork/thread ratio: 1.598
>
> 2.6.6-mm5:
> Total fork time: 40.795 msecs
> Total thread time: 17.615 msecs
> Total fork/thread ratio: 2.316
>
> 2.6.6-mm5+child-runs-last:
> Total fork time: 16.080 msecs
> Total thread time: 8.600 msecs
> Total fork/thread ratio: 1.870
This is very nice, but notice the difference between the total and the
avarage time. Total time includes all the time that taken by the test, and
The avarage/relative time reflect he time taken by the successive calls,
eg it not includes failed pthread_create()s and fork()s, which ones makes
the test false. After posting the test I've realized that the measurement
of the realitve time is not good, so I will patch this problem.


>
> 2.6 introduced an optimisation called child-runs-first. Apparently
> it is nice for many common things, but it also causes a context
> switch at every fork(), so sucks for these microbenchmarks. Child-
> runs-last simply reverts that. Patch is attached if anyone is
> interested.
>


Bye,

Gergely Czuczy
mailto: phoemix@xxxxxxxxxxx
PGP: http://phoemix.harmless.hu/phoemix.pgp

"Wish a god, a star, to believe in,
With the realm of king of fantasy..."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/