Re: Need help debugging memory corruption

From: Jarek Poplawski
Date: Mon May 05 2008 - 03:47:41 EST


On Mon, May 05, 2008 at 07:27:34AM +0000, Jarek Poplawski wrote:
> On Sun, May 04, 2008 at 02:55:29PM -0500, Jay Cliburn wrote:
> > On Sun, 04 May 2008 16:55:08 +0200
> > Jarek Poplawski <jarkao2@xxxxxxxxx> wrote:
> >
> > > Jarek Poplawski wrote, On 05/04/2008 04:24 PM:
> > > ...
> > >
> > > > I'm definitely with less experience, so I wonder why it can't be
> > > > a simple race between atl1_clean_rx_ring() and something (maybe even
> > > > pending atl1_intr_rx()) on the other cpu writing skb while kfreeing?
> > >
> > >
> > > Hmm... atl1_intr_rx() looks impossible, so atl1_alloc_rx_buffers()?
> >
> > I booted with nosmp and the bug is *much* harder to hit, but I still
> > hit it once out of about 10 tries. Does the fact that I hit it once
> > using nosmp disprove the race theory?
>
> Probably not: I don't know how about preemption model, but especially
> some maybe unkilled timers/watchdogs or workqueues could be considered.
> Of course this idea looks very unprobable (should happen with less
> than 4GB too), but should be quite easy to verify by adding some
> temporary spinlocks around these rx ring operations?

Hmm#2... Of course for testing without smp disabling local irqs should
be enough. BTW... since this check is done during __free_slab it seems
such a corruption could take place before this closing, so maybe this
atl1_intr_rx() should be considered in these races yet...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/