Re: 2.6.29-rc1: BUG: unable to handle kernel paging request at000077fed87bb6bf

From: Jody McIntyre
Date: Mon Jan 19 2009 - 16:23:18 EST


On Wed, Jan 14, 2009 at 09:51:20PM -0800, Andrew Morton wrote:
> It needs to be bisected :(
>
> At a guess I'd say that some piece of code somewhere did
> schedule_delayed_work(work) and then scribbled on *work before the
> timer got to run. It could be anywhere at all in your kernel.

I tried your patch on Friday and couldn't find the bug with it. I was
going to try more debugging today but happily, a commit made over the
weekend has fixed the issue.

Thanks,
Jody

> Thomas's debugobjects code could perhaps be used to find the culprit,
> but alas the workqueue code hasn't been taught to use that framework.
>
> We could do something like this:
>
> --- a/kernel/workqueue.c~a
> +++ a/kernel/workqueue.c
> @@ -196,9 +196,14 @@ EXPORT_SYMBOL_GPL(queue_work_on);
>
> static void delayed_work_timer_fn(unsigned long __data)
> {
> - struct delayed_work *dwork = (struct delayed_work *)__data;
> - struct cpu_workqueue_struct *cwq = get_wq_data(&dwork->work);
> - struct workqueue_struct *wq = cwq->wq;
> + struct delayed_work *dwork;
> + struct cpu_workqueue_struct *cwq;
> + struct workqueue_struct *wq;
> +
> + dwork = (struct delayed_work *)__data;
> + printk("delayed_work_timer_fn: handling %p\n", dwork);
> + cwq = get_wq_data(&dwork->work);
> + wq = cwq->wq;
>
> __queue_work(wq_per_cpu(wq, smp_processor_id()), &dwork->work);
> }
> @@ -214,6 +219,8 @@ static void delayed_work_timer_fn(unsign
> int queue_delayed_work(struct workqueue_struct *wq,
> struct delayed_work *dwork, unsigned long delay)
> {
> + printk("queue_delayed_work: queueing %p\n", dwork);
> + dump_stack();
> if (delay == 0)
> return queue_work(wq, &dwork->work);
>
> _
>
> It will print an object address and a call trace in
> queue_delayed_work() and will print an object address in
> delayed_work_timer_fn().
>
> Hopefully when it oopses you'll see the bad object address which was
> obtained by delayed_work_timer_fn(). Then, you can go back through the
> trace and determine which callsite passed that object address to
> queue_delayed_work(). Then we will have our culprit.
>
> It'll possibly generate a lot of output, so logged netconsole might be
> needed.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/