[patch] x86 clone() speedup, 2.1.90-pre2

MOLNAR Ingo (mingo@chiara.csoma.elte.hu)
Thu, 12 Mar 1998 13:13:30 +0100 (CET)


this patch implements 'delayed IO-bitmap clearing'. Benchmarking clone()
thread creation latency on a 100 MHZ P5 UP gives:

without patch:

hell:~> ./lat_clone
best clone() latency: 3316 cycles

with patch:

hell:~> ./lat_clone
best clone() latency: 2941 cycles

a ~13% speedup. I think Linux thus holds the x86 kernel thread creation
speed world record? ;)

[my 100 MHz P5 does 9208 create+wait+exit threads/sec, this should scale
to something like 50K threads/sec on high end x86 systems. More than 70%
of the overhead is now in the wait()/exit() part.]

i've tested sys_ioperm() functionality, and it seems to work fine in all
the important cases.

-- mingo

--- linux/arch/i386/kernel/process.c.orig Tue Mar 17 23:05:24 1998
+++ linux/arch/i386/kernel/process.c Wed Mar 18 00:48:40 1998
@@ -510,9 +517,13 @@
set_ldt_desc(gdt+(nr<<1)+FIRST_LDT_ENTRY,p->ldt, 512);
else
set_ldt_desc(gdt+(nr<<1)+FIRST_LDT_ENTRY,&default_ldt, 1);
- p->tss.bitmap = offsetof(struct thread_struct,io_bitmap);
- for (i = 0; i < IO_BITMAP_SIZE+1 ; i++) /* IO bitmap is actually SIZE+1 */
- p->tss.io_bitmap[i] = ~0;
+ /*
+ * a bitmap offset pointing outside of the TSS limit causes a nicely
+ * controllable SIGSEGV. The first sys_ioperm() call sets up the
+ * bitmap properly.
+ */
+ p->tss.bitmap = sizeof(struct thread_struct);
+
if (last_task_used_math == current)
__asm__("clts ; fnsave %0 ; frstor %0":"=m" (p->tss.i387));

--- linux/arch/i386/kernel/ioport.c.orig Wed Mar 18 00:41:39 1998
+++ linux/arch/i386/kernel/ioport.c Wed Mar 18 00:58:33 1998
@@ -13,6 +13,7 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/smp_lock.h>
+#include <linux/stddef.h>

/* Set EXTENT bits starting at BASE in BITMAP to value TURN_ON. */
static void set_bitmap(unsigned long *bitmap, short base, short extent, int new_value)
@@ -53,12 +54,25 @@
*/
asmlinkage int sys_ioperm(unsigned long from, unsigned long num, int turn_on)
{
+ struct thread_struct * t = &current->tss;
+
if ((from + num <= from) || (from + num > IO_BITMAP_SIZE*32))
return -EINVAL;
if (!suser())
return -EPERM;
+ /*
+ * If it's the first ioperm() call in this thread's lifetime, set the
+ * IO bitmap up. ioperm() is much less timing critical than clone(),
+ * this is why we delay this operation until now:
+ */
+#define IO_BITMAP_OFFSET offsetof(struct thread_struct,io_bitmap)
+
+ if (t->bitmap != IO_BITMAP_OFFSET) {
+ t->bitmap = IO_BITMAP_OFFSET;
+ memset(t->io_bitmap,0xff,(IO_BITMAP_SIZE+1)*4);
+ }

- set_bitmap((unsigned long *)current->tss.io_bitmap, from, num, !turn_on);
+ set_bitmap((unsigned long *)t->io_bitmap, from, num, !turn_on);
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu