Thanks for the benchmark. My P120 at work (2.0.33) spit out 3057 cycles and
my P233MMX (2.0.33) at home spit out 3157 & then 3492...odd. Incidentally, I
was going to run it on my Alpha, but then the brain kicked in and realized that
it was i586+ asm (duh).
> Thats not very cheap, but if you look at the clone() path, almost all of
> the overhead comes from arch/i386/kernel/process.c:copy_thread(). You wont
> get around that function no matter what threading abstraction you use,
> unless you sacrifice some features.
>
> [The function is not quite lightweight, but it could be improved: eg. the
> clearing of the IO bitmap could be delayed until the first ioperm() call
> by setting p->tss.bitmap outside of the TSS segment]
And if it used LWPs, the ioperm would exist soley for the process container.
Anything that isn't context related would be in the process container. It
would seem that the overhead is due to the design, not the call. For all
intents and purposes, LWPs could be implemented transparent to the clone()
call, simply changing the semantics of creation and such. The threads
could be scheduled in process groups (didn't someone add QNX process groups
to 2.1?), each process group being isolated to a single process.
>
> i've attached the code that does the clone() latency measurement.
>
> -- mingo
>
--Perry
-- Perry Harrington Linux rules all OSes. APSoft () email: perry@apsoft.com Think Blue. /\- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu