Re: No one wants to help me :-(

From: george anzinger (george@mvista.com)
Date: Fri Apr 13 2001 - 17:05:51 EST


Brian Gerst wrote:
>
> Mircea Damian wrote:
> >
> > Hello,
> >
> > I was expecting to receive some replies to my last desperate messages:
> > linux-kernel@vger.kernel.org/msg35446.html">http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg35446.html
> > linux-kernel@vger.kernel.org/msg36591.html">http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg36591.html
> >
> > My machine is dyeing in add_timer(). It seems to happen only on SMP
> > machines and is something related to the network driver. For some reason
> > one of the timer lists gets broken so we (we are two people trying to
> > solve this issue) wrote a "safe" timer.c which tries to rebuild the chain
> > in case it hits a NULL pointer.
> >
> > The machine is (ofcourse) slower with this patch but at least it works.
> >
> > Maybe someone can see which is the real bug and fix it.
> >
> > Please help!
>
> I found (at least part of) the problem. In detach_timer() we test if
> the timer is pending. If it is not the function does not remove the
> timer from the list and returns 0. The functions that call
> detach_timer() do not check the return value and unconditionally set the
> list pointers to NULL, even though the timer is still on the list.
> Patch against 2.4.3 attached, but there may be a better solution.
>
uh, but the pending test is to check for NULL pointers, so while your
change consolidates some code, I don't think it changes anything, unless
you can find a place that calls detach_timer and doesn't clear the
pointers...

For what its worth, my look at this problem seems to indicate that the
new timer pointer is zero. This would be a problem in the network code
somewhere. I would guess that the whole structure is being released by
cpu X while cpu y is trying to set up a timer. But then I don't really
know the network code (at all). Just going by the error, defer of zero
in add_timer() which only looks at the timer to verify that the pointers
are zero.

George

George

> --
>
> Brian Gerst
>
> ------------------------------------------------------------------------
> diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c
> --- linux-2.4.3/kernel/timer.c Thu Dec 14 20:52:22 2000
> +++ linux/kernel/timer.c Fri Apr 13 13:26:08 2001
> @@ -194,6 +194,7 @@
> if (!timer_pending(timer))
> return 0;
> list_del(&timer->list);
> + timer->list.next = timer->list.prev = NULL;
> return 1;
> }
>
> @@ -217,7 +218,6 @@
>
> spin_lock_irqsave(&timerlist_lock, flags);
> ret = detach_timer(timer);
> - timer->list.next = timer->list.prev = NULL;
> spin_unlock_irqrestore(&timerlist_lock, flags);
> return ret;
> }
> @@ -246,7 +246,6 @@
>
> spin_lock_irqsave(&timerlist_lock, flags);
> ret += detach_timer(timer);
> - timer->list.next = timer->list.prev = 0;
> running = timer_is_running(timer);
> spin_unlock_irqrestore(&timerlist_lock, flags);
>
> @@ -309,7 +308,6 @@
> data= timer->data;
>
> detach_timer(timer);
> - timer->list.next = timer->list.prev = NULL;
> timer_enter(timer);
> spin_unlock_irq(&timerlist_lock);
> fn(data);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 21:00:21 EST