Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function

From: George Spelvin
Date: Mon May 30 2016 - 12:06:23 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote:
>> +static inline unsigned int fold_hash(unsigned long x, unsigned long y)
>> {
>> + y ^= x * GOLDEN_RATIO_64;
>> + y *= GOLDEN_RATIO_64;
>> + return y >> 32;
>> }

> So does it make sense to use that pattern here too?
>
> This code doesn't much care about performance, but wants a decent hash
> from the stack of class keys.
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 81f1a7107c0e..c8498efcd5d9 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -309,10 +309,12 @@ static struct hlist_head chainhash_table[CHAINHASH_SIZE];
> * It's a 64-bit hash, because it's important for the keys to be
> * unique.
> */
> -#define iterate_chain_key(key1, key2) \
> - (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
> - ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> - (key2))
> +static inline u64 iterate_chain_key(u64 x, u64 y)
> +{
> + y ^= x * GOLDEN_RATIO_64;
> + y *= GOLDEN_RATIO_64;
> + return y;
> +}
>
> void lockdep_off(void)
> {

Not quite. The fold_hash() you quote is used only on 64-bit systems,
which can be assumed to have a reasonable 64-bit multiply. On 32-bit
platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit
multiplies are so expensive.

You actually have only 96 bits of input. The correct prototype is:

static inline u64 iterate_chain_key(u64 key, u32 idx)

If performance mattered, I'd be inclined to use one or two iterations
of the 32-bit HASH_MIX() function, which is specifically designed
to add 32 bits to a 64-bit hash value.

A more thorough mixing would be achieved by __jhash_mix(). Basically:

static inline u64 iterate_chain_key(u64 key, u32 idx)
{
u32 k0 = key, k1 = key >> 32;

__jhash_mix(idx, k0, k1) /* Macro that modifies arguments! */

return k0 | (u64)k1 << 32;
}

(The order of arguments is chosen to perserve the two "most-hashed" values.)


Also, I just had contact from the hppa folks who have brought to my
attention that it's an example of an out-of-order superscalar CPU that
*doesn't* have a good integer multiplier. For general multiplies,
you have to move values to the FPU and the code is a pain.

Instead, it has shift-and-add instructions designed to help the compiler
generate multiplies by constants, but large ones like GOLDEN_RATIO_64
is still a pain.

Here's code to take x (in %r26) and multiply it by GOLDEN_RATIO_64,
with the result in %r28:

shladd,l %r26,1,%r26,%r19
depd,z %r19,46,47,%r31
sub %r31,%r19,%r31
shladd,l %r31,2,%r26,%r31
shladd,l %r31,2,%r26,%r31
shladd,l %r31,2,%r31,%r19
depd,z %r19,60,61,%r31
sub %r31,%r19,%r31
depd,z %r31,54,55,%r28
add,l %r31,%r28,%r31
shladd,l %r31,3,%r26,%r31
depd,z %r31,59,60,%r28
add,l %r31,%r28,%r31
shladd,l %r31,2,%r26,%r31
depd,z %r31,60,61,%r28
sub %r28,%r31,%r28
depd,z %r28,60,61,%r31
sub %r31,%r28,%r31
depd,z %r31,57,58,%r28
add,l %r31,%r28,%r28
depd,z %r28,61,62,%r28
sub %r28,%r26,%r28
shladd,l %r28,3,%r28,%r28

We're going to work on that. Either finding a better code sequence,
or using the 32-bit version of hash_64() witht he nice code I've already
found for multiplies by GOLDEN_RATIO_32.


(But thaks, Peter, for looking a the code!)