Re: [PATCH 2/5] rhashtable: don't hold lock on first table throughout insertion.

From: Herbert Xu
Date: Fri Jul 20 2018 - 03:54:30 EST


On Fri, Jul 06, 2018 at 05:22:30PM +1000, NeilBrown wrote:
> rhashtable_try_insert() currently hold a lock on the bucket in
> the first table, while also locking buckets in subsequent tables.
> This is unnecessary and looks like a hold-over from some earlier
> version of the implementation.
>
> As insert and remove always lock a bucket in each table in turn, and
> as insert only inserts in the final table, there cannot be any races
> that are not covered by simply locking a bucket in each table in turn.
>
> When an insert call reaches that last table it can be sure that there
> is no match entry in any other table as it has searched them all, and
> insertion never happens anywhere but in the last table. The fact that
> code tests for the existence of future_tbl while holding a lock on
> the relevant bucket ensures that two threads inserting the same key
> will make compatible decisions about which is the "last" table.
>
> This simplifies the code and allows the ->rehash field to be
> discarded.
>
> We still need a way to ensure that a dead bucket_table is never
> re-linked by rhashtable_walk_stop(). This can be achieved by
> calling call_rcu() inside the locked region, and checking
> ->rcu.func in rhashtable_walk_stop(). If it is not NULL, then
> the bucket table is empty and dead.
>
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>

...

> @@ -339,13 +338,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
> spin_lock(&ht->lock);
> list_for_each_entry(walker, &old_tbl->walkers, list)
> walker->tbl = NULL;
> - spin_unlock(&ht->lock);
>
> /* Wait for readers. All new readers will see the new
> * table, and thus no references to the old table will
> * remain.
> + * We do this inside the locked region so that
> + * rhashtable_walk_stop() can check ->rcu.func and know
> + * not to re-link the table.
> */
> call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
> + spin_unlock(&ht->lock);
>
> return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
> }

...

> @@ -964,7 +942,7 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
> ht = iter->ht;
>
> spin_lock(&ht->lock);
> - if (tbl->rehash < tbl->size)
> + if (tbl->rcu.func == NULL)
> list_add(&iter->walker.list, &tbl->walkers);
> else
> iter->walker.tbl = NULL;

This appears to be relying on implementation details within RCU.
Paul, are you OK with rhashtable doing this trick?

Thanks,
--
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt