Re: [PATCH v6 6/7] kernfs: Introduce hashed rw-sem to replace per-fs kernfs_rwsem.

From: Tejun Heo
Date: Mon Feb 14 2022 - 13:10:51 EST


On Mon, Feb 14, 2022 at 11:03:21PM +1100, Imran Khan wrote:
> +/**
> + * up_write_kernfs_rwsem_for_two_nodes() - Release hashed rwsem for 2 nodes
> + *
> + * @kn1: kernfs_node for which hashed rwsem needs to be released
> + * @kn2: kernfs_node for which hashed rwsem needs to be released
> + *
> + * In case of nested locking, rwsem with higher address is released first.
> + */
> +static inline void up_write_kernfs_rwsem_for_two_nodes(struct kernfs_node *kn1,
> + struct kernfs_node *kn2)
> +{
> + struct rw_semaphore *rwsem1 = kernfs_rwsem_ptr(kn1);
> + struct rw_semaphore *rwsem2 = kernfs_rwsem_ptr(kn2);
> +
> + if (rwsem1 == rwsem2)
> + up_write(rwsem1);
> + else {
> + if (rwsem1 > rwsem2) {
> + up_write(rwsem1);
> + up_write(rwsem2);
> + } else {
> + up_write(rwsem2);
> + up_write(rwsem1);
> + }
> + }
> +
> + kernfs_put(kn1);
> + kernfs_put(kn2);
> +}

You don't need to order unlocks.

> +/**
> + * down_read_kernfs_rwsem_for_two_nodes() - Acquire hashed rwsem for 2 nodes
> + *
> + * @kn1: kernfs_node for which hashed rwsem needs to be taken
> + * @kn2: kernfs_node for which hashed rwsem needs to be taken
> + *
> + * In certain cases we need to acquire hashed rwsem for 2 nodes that don't have a
> + * parent child relationship. This is one of the cases of nested locking involving
> + * hashed rwsem and rwsem with lower address is acquired first.
> + */
> +static inline void down_read_kernfs_rwsem_for_two_nodes(struct kernfs_node *kn1,
> + struct kernfs_node *kn2)

Maybe something like kernfs_down_read_double_nodes() is enough as the name?
up/down already imply rwsem.

> +static inline void down_read_kernfs_rwsem(struct kernfs_node *kn,
> + enum kernfs_rwsem_lock_pattern ptrn)
> +{
> + struct rw_semaphore *p_rwsem = NULL;
> + struct rw_semaphore *rwsem = kernfs_rwsem_ptr(kn);
> + int lock_parent = 0;

bool?

> +
> + if (ptrn == KERNFS_RWSEM_LOCK_SELF_AND_PARENT && kn->parent)

I wonder whether it'd be clearer to separate the double lock case into its
own function. The backend implementation being shared is fine but if we had
e.g. kernfs_down_read() and kernfs_down_read_double(), wouldn't that be
simpler?

> + lock_parent = 1;
> +
> + if (lock_parent)
> + p_rwsem = kernfs_rwsem_ptr(kn->parent);
> +
> + if (!lock_parent || rwsem == p_rwsem) {
> + down_read_nested(rwsem, 0);
> + kernfs_get(kn);
> + kn->unlock_parent = 0;
> + } else {
> + /**
> + * In case of nested locking, locks are taken in order of their
> + * addresses. lock with lower address is taken first, followed
> + * by lock with higher address.
> + */
> + if (rwsem < p_rwsem) {
> + down_read_nested(rwsem, 0);
> + down_read_nested(p_rwsem, 1);
> + } else {
> + down_read_nested(p_rwsem, 0);
> + down_read_nested(rwsem, 1);
> + }
> + kernfs_get(kn);
> + kernfs_get(kn->parent);
> + kn->unlock_parent = 1;

I wouldn't put this inside kernfs_node. Either make the same decision
(whether it has a parent) in up() or return something which can be passed to
up() by the caller.

> +/**
> + * down_write_kernfs_rwsem_rename_ns() - take hashed rwsem during

kernfs_down_write_triple()?

> +static inline void up_write_kernfs_rwsem_rename_ns(struct kernfs_node *kn,
> + struct kernfs_node *current_parent,
> + struct kernfs_node *old_parent)
> +{
> + struct rw_semaphore *array[3];
> +
> + array[0] = kernfs_rwsem_ptr(kn);
> + array[1] = kernfs_rwsem_ptr(current_parent);
> + array[2] = kernfs_rwsem_ptr(old_parent);

So, we had sth like the following:

struct kernfs_rwsem_token {
struct kernfs_node held[3];
};

which the down functions return (probably as out argument), wouldn't we be
able to share up() for all variants and make the code simpler?

> +static inline void down_read_kernfs_rwsem_rename_ns(struct kernfs_node *kn,
> + struct kernfs_node *current_parent,
> + struct kernfs_node *new_parent)
> +{
> + struct rw_semaphore *array[3];
> +
> + array[0] = kernfs_rwsem_ptr(kn);
> + array[1] = kernfs_rwsem_ptr(current_parent);
> + array[2] = kernfs_rwsem_ptr(new_parent);
> +
> + if (array[0] == array[1] && array[0] == array[2]) {
> + /* All 3 nodes hash to same rwsem */
> + down_read_nested(array[0], 0);
> + } else {
> + /**
> + * All 3 nodes are not hashing to the same rwsem, so sort the
> + * array.
> + */
> + kernfs_sort_rwsems(array);
> +
> + if (array[0] == array[1] || array[1] == array[2]) {
> + /**
> + * Two nodes hash to same rwsem, and these
> + * will occupy consecutive places in array after
> + * sorting.
> + */
> + down_read_nested(array[0], 0);
> + down_read_nested(array[2], 1);
> + } else {
> + /* All 3 nodes hashe to different rwsems */
> + down_read_nested(array[0], 0);
> + down_read_nested(array[1], 1);
> + down_read_nested(array[2], 2);
> + }
> + }

How about factoring out "am I locking one, two or three?" into a function -
e.g. the sort function takes the array, sort & uniq's them into locking
token so that the down functions (for both double and triple) just do what's
the token says.

Thanks.

--
tejun