Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

From: Kohli, Gaurav
Date: Tue Jun 05 2018 - 07:13:56 EST


Hi Peter,

As last mentioned on mail, we are still seeing issue with the latest approach and below is the susceptible race as mentioned earlier..
controller Thread CPUHP Thread
takedown_cpu
kthread_park
kthread_parkme
Set KTHREAD_SHOULD_PARK
smpboot_thread_fn
set Task interruptible


wake_up_process
if (!(p->state & state))
goto out;

Kthread_parkme
SET TASK_PARKED
schedule
raw_spin_lock(&rq->lock)
ttwu_remote
waiting for __task_rq_lock
context_switch

finish_lock_switch



Case TASK_PARKED
kthread_park_complete


SET Running


So it seems issue is still their with the latest mentioned fix
kthread, sched/wait: Fix kthread_parkme() completion issue.

Regards
Gaurav

On 5/7/2018 4:53 PM, Kohli, Gaurav wrote:
Corrected the formatting, Sorry for spam.



HI Peter,

We have tested with new patch and still seeing same issue, in this dumps we don't have debug traces, but seems there still exist race from code review , Can you please check it once:

Controller ThreadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPUHP Thread
takedown_cpu
kthread_park
kthread_parkme
Set KTHREAD_SHOULD_PARK
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ smpboot_thread_fn
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ set Task interruptible


wake_up_process

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Kthread_parkme
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ SET TASK_PARKED
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ schedule
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ raw_spin_lock(&rq->lock)

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ context_switch

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ finish_lock_switch



ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Case TASK_PARKED
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread_park_complete


SET TASK_INTERRUPTIBLE


And also seeing the same warning during unpark of cpuhp from controller:
ÂÂif (!wait_task_inactive(p, state)) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ WARN_ON(1);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return;
ÂÂÂÂÂÂÂÂ }
325.065893] [<ffffff8920ed0200>] kthread_unpark+0x80/0xd8
[Â 325.065902] [<ffffff8920eab754>] bringup_cpu+0xa0/0x12c
[Â 325.065910] [<ffffff8920eaae90>] cpuhp_invoke_callback+0xb4/0x5c8
[Â 325.065917] [<ffffff8920eabd98>] cpuhp_up_callbacks+0x3c/0x154
[Â 325.065924] [<ffffff8920ead220>] _cpu_up+0x134/0x208
[Â 325.065931] [<ffffff8920ead45c>] do_cpu_up+0x168/0x1a0
[Â 325.065938] [<ffffff8920ead4b8>] cpu_up+0x24/0x30
[Â 325.065948] [<ffffff89215b1408>] cpu_subsys_online+0x20/0x2c
[Â 325.065956] [<ffffff89215aac64>] device_online+0x70/0xb4
[Â 325.065962] [<ffffff89215aad78>] online_store+0xd0/0xdc
[Â 325.065971] [<ffffff89215a7424>] dev_attr_store+0x40/0x54
[Â 325.065982] [<ffffff89210d8a98>] sysfs_kf_write+0x5c/0x74
[Â 325.065988] [<ffffff89210d7b9c>] kernfs_fop_write+0xcc/0x1ec
[Â 325.065999] [<ffffff8921049288>] vfs_write+0xb4/0x1d0
[Â 325.066006] [<ffffff892104a858>] SyS_write+0x60/0xc0
[Â 325.066014] [<ffffff8920e83770>] el0_svc_naked+0x24/0x28


And after this same crash occured:
[Â 325.521307] [<ffffff8920ed4aac>] smpboot_thread_fn+0x26c/0x2c8
[Â 325.527295] [<ffffff8920ecfb24>] kthread+0xf4/0x108

I will put more debug ftraces to check what is going on exactly.

Regards
Gaurav






--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.