Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs

From: Kamalesh Babulal
Date: Tue Feb 05 2008 - 01:32:54 EST


Ingo Molnar wrote:
> * Kamalesh Babulal <kamalesh@xxxxxxxxxxxxxxxxxx> wrote:
>
>> The CONFIG_NO_HZ is not set and the system seems not be truly locked
>> up ,btw wc -l of the softlockup messages is around 108 times, while
>> running the libhugetlbfs only and this is reproducible with the
>> 2.6.24-git7 also.
>
> Peter just fixed a handful of bugs in this area - does the patch below
> help?
>
> Ingo
>
> ------------------>
> Subject: debug: softlockup looping fix
> From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>
> Rafael J. Wysocki reported weird, multi-seconds delays during
> suspend/resume and bisected it back to:
>
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@xxxxxxx>
> Date: Fri Jan 25 21:08:02 2008 +0100
>
> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>
> fix it:
>
> - restore the old wakeup mechanism
> - fix break usage in do_each_thread() { } while_each_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> kernel/softlockup.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> Index: linux/kernel/softlockup.c
> ===================================================================
> --- linux.orig/kernel/softlockup.c
> +++ linux/kernel/softlockup.c
> @@ -101,6 +101,10 @@ void softlockup_tick(void)
>
> now = get_timestamp(this_cpu);
>
> + /* Wake up the high-prio watchdog task every second: */
> + if (now > (touch_timestamp + 1))
> + wake_up_process(per_cpu(watchdog_task, this_cpu));
> +
> /* Warn about unreasonable delays: */
> if (now <= (touch_timestamp + softlockup_thresh))
> return;
> @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t
> read_lock(&tasklist_lock);
> do_each_thread(g, t) {
> if (!--max_count)
> - break;
> + goto unlock;
> if (t->state & TASK_UNINTERRUPTIBLE)
> check_hung_task(t, now);
> } while_each_thread(g, t);
> -
> + unlock:
> read_unlock(&tasklist_lock);
> }
>
> @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
> * debug-printout triggers in softlockup_tick().
> */
> while (!kthread_should_stop()) {
> + set_current_state(TASK_INTERRUPTIBLE);
> touch_softlockup_watchdog();
> - msleep_interruptible(10000);
> + schedule();
> +
> + if (kthread_should_stop())
> + break;
>
> if (this_cpu != check_cpu)
> continue;
>
> if (sysctl_hung_task_timeout_secs)
> check_hung_uninterruptible_tasks(this_cpu);
> +
> }
>
> return 0;
> @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb,
> wake_up_process(per_cpu(watchdog_task, hotcpu));
> break;
> #ifdef CONFIG_HOTPLUG_CPU
> - case CPU_UP_CANCELED:
> - case CPU_UP_CANCELED_FROZEN:
> - if (!per_cpu(watchdog_task, hotcpu))
> - break;
> - /* Unbind so it can run. Fall thru. */
> - kthread_bind(per_cpu(watchdog_task, hotcpu),
> - any_online_cpu(cpu_online_map));
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> if (hotcpu == check_cpu) {
> @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb,
> check_cpu = any_online_cpu(temp_cpu_online_map);
> }
> break;
> +
> + case CPU_UP_CANCELED:
> + case CPU_UP_CANCELED_FROZEN:
> + if (!per_cpu(watchdog_task, hotcpu))
> + break;
> + /* Unbind so it can run. Fall thru. */
> + kthread_bind(per_cpu(watchdog_task, hotcpu),
> + any_online_cpu(cpu_online_map));
> case CPU_DEAD:
> case CPU_DEAD_FROZEN:
> p = per_cpu(watchdog_task, hotcpu);
Hi Ingo,

Thanks for the patch. The softlockup is not always reproducible, I tried six rounds without the patch to reproduce
the softlockup but was not able to. This is not seen after the 2.6.24-git8 and above, hope because of peters patch
is already there in in the git(s).

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/