drivers/char/random.c: More futzing about

From: George Spelvin
Date: Mon Jun 09 2014 - 09:17:46 EST


Just as an example of some more ambitious changes I'm playing with...

I really think the polynomial + twist has outlived its usefulness.
In particular, table lookups in infrequently accessed code are called
D-cache misses and are undesirable. And the input_rotate is an ugly
kludge to compensate for inadequate mixing.

Here's an example of a smaller, faster, and better fast_mix() function.
The mix is invertible (thus preserving entropy), but causes each input
bit or pair of bits to avalanche to at least 43 bits after 2 rounds and
120 bit0 after 3.

For comparison, with the current linear fast_mix function, bits above
the 12th in the data words only affect 4 other bits after one repetition.

With 3 iterations, it runs in 2/3 the time of the current fast_mix
and is half the size: 84 bytes of object code as opposed to 168.

void fast_mix2(struct fast_pool *f, u32 const input[4])
{
u32 a = f->pool[0] ^ input[0], b = f->pool[1] ^ input[1];
u32 c = f->pool[2] ^ input[2], d = f->pool[3] ^ input[3];
int i;

for (i = 0; i < 3; i++) {
/*
* Inspired by ChaCha's QuarterRound, but
* modified for much greater parallelism.
* Surprisingly, rotating a and c seems to work
* better than b and d. And it runs faster.
*/
a += b; c += d;
d ^= a; b ^= c;
a = rol32(a, 15); c = rol32(c, 21);

a += b; c += d;
d ^= a; b ^= c;
a = rol32(a, 3); c = rol32(c, 7);
}
f->pool[0] = a; f->pool[1] = b;
f->pool[2] = c; f->pool[3] = d;
f->count++;
}

Other good rotate sets:
score = 43/120/121: 23 6 18 11
score = 42/120/123: 17 15 4 24
score = 42/120/122: 4 7 19 17
score = 43/122/122: 24 6 16 12
score = 42/121/122: 25 22 11 8
score = 42/122/121: 16 20 11 23
score = 43/120/122: 8 11 17 19
score = 46/121/123: 15 21 3 7
score = 43/120/122: 7 11 15 21
score = 42/120/122: 7 10 17 13
score = 42/120/123: 11 3 18 23
score = 44/121/122: 20 10 26 24
score = 42/123/122: 10 5 18 21
score = 44/120/122: 26 21 7 9
score = 42/121/122: 21 11 7 8
score = 42/120/122: 11 11 27 19
score = 42/121/122: 6 18 4 27
score = 42/121/122: 13 23 3 4


If you want smaller code, or more flexibility in the number of rounds,
try (24 5 24 5) or (27 8 27 8). The avalanche is only slightly worse,
but unrolling twice shaves a smidgen off the run time, so the extra 2
rotate constants come for free.

If you want something linear, there are better ways to get better mixing.
Here's one based on a period-2^128-1 xorshift generator:

void fast_mix3(struct fast_pool *f, u32 const input[4])
{
u32 a = f->pool[0] ^ input[0];
u32 b = f->pool[1] ^ input[1];
u32 c = f->pool[2] ^ input[2];
u32 d = f->pool[3] ^ input[3];

/*
* See Marsagalia, "Xorshift RNGs". Possible shift amounts
* are [5, 14, 1], [15, 4, 21], [23, 24, 3], [5, 12, 29].
*/
a ^= a<<15; a ^= a>>4 ^ d ^ d>>21; f->pool[0] = a;
b ^= b<<15; b ^= b>>4 ^ a ^ a>>21; f->pool[1] = b;
c ^= c<<15; c ^= c>>4 ^ b ^ b>>21; f->pool[2] = c;
d ^= d<<15; d ^= d>>4 ^ c ^ c>>21; f->pool[3] = d;

f->count++;
}

... However this is slower than 2 iterations of fast_mix2, and
doesn't avalanche as much.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/