Re: Need help debugging memory corruption

From: Jarek Poplawski
Date: Mon May 05 2008 - 03:24:31 EST


On Sun, May 04, 2008 at 02:55:29PM -0500, Jay Cliburn wrote:
> On Sun, 04 May 2008 16:55:08 +0200
> Jarek Poplawski <jarkao2@xxxxxxxxx> wrote:
>
> > Jarek Poplawski wrote, On 05/04/2008 04:24 PM:
> > ...
> >
> > > I'm definitely with less experience, so I wonder why it can't be
> > > a simple race between atl1_clean_rx_ring() and something (maybe even
> > > pending atl1_intr_rx()) on the other cpu writing skb while kfreeing?
> >
> >
> > Hmm... atl1_intr_rx() looks impossible, so atl1_alloc_rx_buffers()?
>
> I booted with nosmp and the bug is *much* harder to hit, but I still
> hit it once out of about 10 tries. Does the fact that I hit it once
> using nosmp disprove the race theory?

Probably not: I don't know how about preemption model, but especially
some maybe unkilled timers/watchdogs or workqueues could be considered.
Of course this idea looks very unprobable (should happen with less
than 4GB too), but should be quite easy to verify by adding some
temporary spinlocks around these rx ring operations?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/