Looks like you just save my time here, Santosh has also reported aAs you know, Santosh is in the same cage as me.
similar race in a personal mail..
You see it. the policy had been moved to another CPU, you want the Thread A
On 29 January 2015 at 12:12, Ethan Zhao <ethan.zhao@xxxxxxxxxx> wrote:
There is race observed between PPC changed notification handler worker threadWhere is the race ? How do you say this is racy ?
and vcpu_hotplug() called within xenbus_thread() context.
It is shown as following WARNING:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
[ 14.003548] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted
...
[ 14.003553] Workqueue: kacpi_notify acpi_os_execute_deferred
[ 14.003554] 0000000000000000 000000008c76682c ffff88094c793af8
ffffffff81661b14
[ 14.003556] 0000000000000000 0000000000000000 ffff88094c793b38
ffffffff81072b61
[ 14.003558] ffff88094c793bd8 ffff8812491f8800 0000000000000292
0000000000000000
[ 14.003560] Call Trace:
[ 14.003567] [<ffffffff81661b14>] dump_stack+0x46/0x58
[ 14.003571] [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[ 14.003572] [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[ 14.003574] [<ffffffff812e16d1>] kobject_get+0x41/0x50
[ 14.003579] [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[ 14.003581] [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[ 14.003586] [<ffffffff810b8cb2>] ? up+0x32/0x50
[ 14.003589] [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[ 14.003591] [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[ 14.003593] [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[ 14.003596] [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[ 14.003601] [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[ 14.003604] [<ffffffff81089566>] ? move_linked_works+0x66/0x90
[ 14.003606] [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[ 14.003609] [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[ 14.003611] [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[ 14.003614] [<ffffffff8108c910>] process_one_work+0x160/0x410
[ 14.003616] [<ffffffff8108d05b>] worker_thread+0x11b/0x520
[ 14.003617] [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[ 14.003621] [<ffffffff81092421>] kthread+0xe1/0x100
[ 14.003623] [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[ 14.003628] [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[ 14.003630] [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[ 14.003631] ---[ end trace 89e66eb9795efdf7 ]---
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_DOWN_PREPARE..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_prepare()
update_policy_cpu()
kobject_move()
Please take a look at the code I composed, I thought of it whole night.
I am not sure if the problem is with kobject_move(), to me it looked like
the problem is with cpufreq_policy_put_kobj() and we tried to do kobject_get()
after the kobject has been freed..
I don't agree to the solution you gave, but lets first make sure what the
problem is, and then take any action against it.
Please try this patch and let us know if it fixes it for you:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..5ced9cca4822 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1411,6 +1411,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
read_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
+ per_cpu(cpufreq_cpu_data, cpu) = NULL;
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) {
@@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
}
}
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
return 0;
}