I've done some basic timing tests for shared page tables using a simple
fork test I wrote. It has three modes:

The first mode forks as fast as it can, then calculates how long each fork
took. This measures the time the fork() system call took.

The second mode adds a wait() for the child after the fork. The child just
calls exit(0). This measures how long the child ran.

The third mode adds an exec() in the child of a very small executable,
which just exits. This adds the exec() time to the mix.

The program also optionally allocates a shared memory object and touches
all the pages in it before the start of the test. This adds extra pages to
be dealt with by fork/exec/exit. None of the pages are touched after the
test starts.

I ran this test in three cases, 2.5.41, 2.5.41-mm2 without share, and
2.5.41-mm2 with share.

Now for the results (all times are in ms):

                2.5.41 mm2-unshared mm2-shared
                ------ ------------ ----------


400K 1.7 1.6 0.5 4M 5.0 5.0 3.4 40M 28.4 29.5 3.4

fork/exit ---------

400K 1.7 1.6 1.6 4M 4.9 5.3 4.1 40M 44.2 45.1 4.1

fork/exec/exit --------------

400K 6.5 7.5 7.7 4M 10.3 11.9 10.7 40M 49.3 51.4 10.7

I don't know why exec introduces a small penalty for small tasks. I'm working on some optimizations that might help.

