Re: race condition in schedule_on_each_cpu()

From: Tejun Heo
Date: Thu Jun 06 2013 - 17:23:28 EST


Hello,

On Thu, Jun 06, 2013 at 06:14:46PM +0800, éå wrote:
> Hello, Tejun Heo
> thanks for your help,
> 1) I've test the two kernel version on this problem:
> latest 3.10-rc3ï(https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.10-rc3.tar.xz)
> latest 3.0branch - 3.0.80ï(https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.0.80.tar.xz
>
> they all work fine when hot remove raid disk..

Thanks for verifying.

> | preemption | machine 1 | machine 2 | kversion |
> -------------------------------------------------------------------------------
> | Fully Preemptible | stuck | no stuck | 3.0.30-rt50 |
> | Low-Latency Desktop | no stuck | no stuck | 3.0.30-rt50 |
> | Low-Latency Desktop | no stuck | -- | 3.0.30 |
> | default | no stuck | -- | 3.0.80 |
> | default | no stuck | -- | 3.10-rc3 |
>
> could you tell me some way to debug this problem. for example, how
> to debug workqueue deadlock? I want to find the deadlock point.

I looked through the logs but the only worker depletion related
patches which pop up are around CPU hotplugs, so I don't think they
apply here. If the problem is relatively easy to reproduce && you
can't move onto a newer kernel, I'm afraid bisection probably is the
best option.

Thanks!

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/