Re: [PATCH] Split futex global spinlock futex_lock

From: Jamie Lokier
Date: Wed Sep 10 2003 - 06:08:39 EST


Hu, Boris wrote:
> Split futex global spinlock futex_lock into hash bucket spinlocks.

> +/*
> + * Split the global futex_lock into every hash list lock.
> + */
> +struct futex_hash_bucket {
> + struct list_head chain;
> + spinlock_t lock;
> +};

Put "lock" first: it is always the first field accessed. That will
save a few clock cycles on some systems.

I was going to suggest something about cache alignment, but of course
that doesn't make sense. If the structure is made any larger,
it might as well contain more hash buckets.

Thinking a little deeper, it occurs to me that for scalable SMP
performance, you want:

(1 << FUTEX_HASHBITS) > some factor * SMP_CACHE_BYTES * NR_CPUS / sizeof (bucket)

To put it into perspective, consider a hypothetical 16-way P4.
SMP_CACHE_BYTES is 128 on a P4. That's 24 cache lines in the whole hash table.

If there are only a few futexes in the table at any time, the dominant
time for each operation is going to be the spinlock and associated
cache line transfers, not traversing a bucket's list.

So that hypothetical box would have an effective hash table of only 24 buckets.

What I'm saying is:

If you're able to benchmark changes to the code on a big box,
and you want to tune it's performance, try changing
FUTEX_HASHBITS. Also try adding a dummy word into
futex_hash_bucket, and use __cache_aligned_in_smp on the array so that
no buckets straddle two cache lines.

Thought for the day...
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/