Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking,WARNs and BUGs

From: Eric Paris
Date: Tue Dec 22 2009 - 09:31:59 EST


On Tue, 2009-12-22 at 09:48 +0100, Peter Zijlstra wrote:
> On Mon, 2009-12-21 at 19:17 -0500, Eric Paris wrote:
> > Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> > I'm exploding in the scheduler. I'm running (and building) kernel
> > f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
> > signatures of problems. Some boots I'll see all 3 of these failures
> > sometimes only 1 or 2 of them. That's the reason they are kinda split
> > up in dmesg.

Appears the kernel built with no oops, circular locking, or bugs. Only
thing in dmesg during the build was:

DMA-API: debugging out of memory - disabling

I'm going to do it a number more times to be sure, but I had 100%
failure rate before. Sounds like this patch has got to go.

-Eric

> >
> > 1) gcc/3141 is trying to acquire lock:
> > (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
> >
> > but task is already holding lock:
> > (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>
> This is due to the pagefalut happening while holding the rq->lock, so
> its an artefact of 3).
>
> > 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>
> Worrying, but probably due to the same problem as 3)
>
> > 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
> > kernel/sched_fair.c
>
> Right, hard to tell where exactly it goes bang, but could you please try
> reverting the below patch.
>
> What I suspect happens is that we his the task_cpu(p)==cpu case, we then
> don't do __set_task_cpu()->set_task_rq(), which sets the group
> scheduling pointers (you seem to have cgroup scheduling enabled).
>
> If those pointers are wild all kinds of interesting bits can happen,
> including 3) and possibly 2).
>
> If this revert doesn't help, could you please also provide the output of
> addr2line -e vmlinux <FAULT_IP> ?
>
> ---
> commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a
> Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Date: Wed Dec 16 18:04:42 2009 +0100
>
> sched: Simplify set_task_cpu()
>
> Rearrange code a bit now that its a simpler function.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Mike Galbraith <efault@xxxxxx>
> LKML-Reference: <20091216170518.269101883@xxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f92ce63..8a2bfd3 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2034,11 +2034,8 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
> return delta < (s64)sysctl_sched_migration_cost;
> }
>
> -
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> - int old_cpu = task_cpu(p);
> -
> #ifdef CONFIG_SCHED_DEBUG
> /*
> * We should never call set_task_cpu() on a blocked task,
> @@ -2049,11 +2046,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>
> trace_sched_migrate_task(p, new_cpu);
>
> - if (old_cpu != new_cpu) {
> - p->se.nr_migrations++;
> - perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS,
> - 1, 1, NULL, 0);
> - }
> + if (task_cpu(p) == new_cpu)
> + return;
> +
> + p->se.nr_migrations++;
> + perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
>
> __set_task_cpu(p, new_cpu);
> }
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/