[PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods

From: Tejun Heo
Date: Thu Jul 09 2015 - 16:40:21 EST


blkcg_pol_mutex primarily protects the blkcg_policy array. It also
protects cgroup file type [un]registration during policy addition /
removal. This puts blkcg_pol_mutex outside cgroup internal
synchronization and in turn makes it impossible to grab from blkcg's
cgroup methods as that leads to cyclic dependency.

Another problematic dependency arising from this is through cgroup
interface file deactivation. Removing a cftype requires removing all
files of the type which in turn involves draining all on-going
invocations of the file methods. This means that an interface file
implementation can't grab blkcg_pol_mutex as draining can lead to AA
deadlock.

blkcg_reset_stats() is already in this situation. It currently
trylocks blkcg_pol_mutex and then unwinds and retries the whole
operation on failure, which is cumbersome at best. It has a lengthy
comment explaining how cgroup internal synchronization is involved and
expected to be updated but as explained above this doesn't need cgroup
internal locking to deadlock. It's a self-contained AA deadlock.

The described circular dependencies can be easily broken by moving
cftype [un]registration out of blkcg_pol_mutex and protect them with
an outer mutex. This patch introduces blkcg_pol_register_mutex which
wraps entire policy [un]registration including cftype operations and
shrinks blkcg_pol_mutex critical section. This also makes the trylock
dancing in blkcg_reset_stats() unnecessary. Removed.

This patch is necessary for the following blkcg_policy_data allocation
bug fixes.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
Cc: Arianna Avanzini <avanzini.arianna@xxxxxxxxx>
---
block/blk-cgroup.c | 40 +++++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 5e2723f..2ff74ff 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -29,6 +29,14 @@

#define MAX_KEY_LEN 100

+/*
+ * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
+ * blkcg_pol_register_mutex nests outside of it and synchronizes entire
+ * policy [un]register operations including cgroup file additions /
+ * removals. Putting cgroup file registration outside blkcg_pol_mutex
+ * allows grabbing it from cgroup callbacks.
+ */
+static DEFINE_MUTEX(blkcg_pol_register_mutex);
static DEFINE_MUTEX(blkcg_pol_mutex);

struct blkcg blkcg_root;
@@ -453,20 +461,7 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
struct blkcg_gq *blkg;
int i;

- /*
- * XXX: We invoke cgroup_add/rm_cftypes() under blkcg_pol_mutex
- * which ends up putting cgroup's internal cgroup_tree_mutex under
- * it; however, cgroup_tree_mutex is nested above cgroup file
- * active protection and grabbing blkcg_pol_mutex from a cgroup
- * file operation creates a possible circular dependency. cgroup
- * internal locking is planned to go through further simplification
- * and this issue should go away soon. For now, let's trylock
- * blkcg_pol_mutex and restart the write on failure.
- *
- * http://lkml.kernel.org/g/5363C04B.4010400@xxxxxxxxxx
- */
- if (!mutex_trylock(&blkcg_pol_mutex))
- return restart_syscall();
+ mutex_lock(&blkcg_pol_mutex);
spin_lock_irq(&blkcg->lock);

/*
@@ -1190,6 +1185,7 @@ int blkcg_policy_register(struct blkcg_policy *pol)
if (WARN_ON(pol->pd_size < sizeof(struct blkg_policy_data)))
return -EINVAL;

+ mutex_lock(&blkcg_pol_register_mutex);
mutex_lock(&blkcg_pol_mutex);

/* find an empty slot */
@@ -1198,19 +1194,23 @@ int blkcg_policy_register(struct blkcg_policy *pol)
if (!blkcg_policy[i])
break;
if (i >= BLKCG_MAX_POLS)
- goto out_unlock;
+ goto err_unlock;

/* register and update blkgs */
pol->plid = i;
blkcg_policy[i] = pol;
+ mutex_unlock(&blkcg_pol_mutex);

/* everything is in place, add intf files for the new policy */
if (pol->cftypes)
WARN_ON(cgroup_add_legacy_cftypes(&blkio_cgrp_subsys,
pol->cftypes));
- ret = 0;
-out_unlock:
+ mutex_unlock(&blkcg_pol_register_mutex);
+ return 0;
+
+err_unlock:
mutex_unlock(&blkcg_pol_mutex);
+ mutex_unlock(&blkcg_pol_register_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(blkcg_policy_register);
@@ -1223,7 +1223,7 @@ EXPORT_SYMBOL_GPL(blkcg_policy_register);
*/
void blkcg_policy_unregister(struct blkcg_policy *pol)
{
- mutex_lock(&blkcg_pol_mutex);
+ mutex_lock(&blkcg_pol_register_mutex);

if (WARN_ON(blkcg_policy[pol->plid] != pol))
goto out_unlock;
@@ -1233,8 +1233,10 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
cgroup_rm_cftypes(pol->cftypes);

/* unregister and update blkgs */
+ mutex_lock(&blkcg_pol_mutex);
blkcg_policy[pol->plid] = NULL;
-out_unlock:
mutex_unlock(&blkcg_pol_mutex);
+out_unlock:
+ mutex_unlock(&blkcg_pol_register_mutex);
}
EXPORT_SYMBOL_GPL(blkcg_policy_unregister);
--
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/