of_node_put() usage is buggy all over drivers/of/base.c?!

From: Vladimir Oltean
Date: Fri Aug 13 2021 - 21:01:47 EST


Hi,

I was debugging an RCU stall which happened during the probing of a
driver. Activating lock debugging, I see:

[ 101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
[ 101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh
[ 101.726763] INFO: lockdep is turned off.
[ 101.730674] irq event stamp: 0
[ 101.733716] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 101.739973] hardirqs last disabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
[ 101.748146] softirqs last enabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
[ 101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272
[ 101.774558] Call trace:
[ 101.794734] __might_sleep+0x50/0x88
[ 101.798297] __mutex_lock+0x60/0x938
[ 101.801863] mutex_lock_nested+0x38/0x50
[ 101.805775] kernfs_remove+0x2c/0x50 <---- this takes mutex_lock(&kernfs_mutex);
[ 101.809341] sysfs_remove_dir+0x54/0x70
[ 101.813166] __kobject_del+0x3c/0x80
[ 101.816733] kobject_put+0xf8/0x108
[ 101.820211] of_node_put+0x18/0x28
[ 101.823602] of_find_compatible_node+0xa8/0xf8 <--- this takes raw_spin_lock_irqsave(&devtree_lock)
[ 101.828036] sja1105_mdiobus_register+0x264/0x7a8

The pattern of calling of_node_put from under the atomic devtree_lock
context is pretty widespread in drivers/of/base.c.

Just by inspecting the code, this seems to be an issue since commit:

commit 75b57ecf9d1d1e17d099ab13b8f48e6e038676be
Author: Grant Likely <grant.likely@xxxxxxxxxx>
Date: Thu Feb 20 18:02:11 2014 +0000

of: Make device nodes kobjects so they show up in sysfs

Device tree nodes are already treated as objects, and we already want to
expose them to userspace which is done using the /proc filesystem today.
Right now the kernel has to do a lot of work to keep the /proc view in
sync with the in-kernel representation. If device_nodes are switched to
be kobjects then the device tree code can be a whole lot simpler. It
also turns out that switching to using /sysfs from /proc results in
smaller code and data size, and the userspace ABI won't change if
/proc/device-tree symlinks to /sys/firmware/devicetree/base.

v7: Add missing sysfs_bin_attr_init()
v6: Add __of_add_property() early init fixes from Pantelis
v5: Rename firmware/ofw to firmware/devicetree
Fix updating property values in sysfs
v4: Fixed build error on Powerpc
Fixed handling of dynamic nodes on powerpc
v3: Fixed handling of duplicate attribute and child node names
v2: switch to using sysfs bin_attributes which solve the problem of
reporting incorrect property size.

Signed-off-by: Grant Likely <grant.likely@xxxxxxxxxxxx>
Tested-by: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>
Cc: Rob Herring <rob.herring@xxxxxxxxxxx>
Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Cc: David S. Miller <davem@xxxxxxxxxxxxx>
Cc: Nathan Fontenot <nfont@xxxxxxxxxxxxxxxxxx>
Cc: Pantelis Antoniou <panto@xxxxxxxxxxxxxxxxxxxxxxx>

because up until that point, of_node_put() was:

void of_node_put(struct device_node *node)
{
if (node)
kref_put(&node->kref, of_node_release);
}

and not:

void of_node_put(struct device_node *node)
{
if (node)
kobject_put(&node->kobj);
}

Either I'm holding it very, very wrong, or this is a very severe
oversight that just happened somehow to go unnoticed for 7 years.

Please tell me it's me.