Re: Revert "gro: Fix legacy path napi_complete crash",

From: Ingo Molnar
Date: Tue Mar 24 2009 - 18:02:22 EST



* David Miller <davem@xxxxxxxxxxxxx> wrote:

> From: Ingo Molnar <mingo@xxxxxxx>
> Date: Tue, 24 Mar 2009 21:54:44 +0100
>
> > * Ingo Molnar <mingo@xxxxxxx> wrote:
> >
> > > > Same forcedeth box i reported before. Config below. (note: if
> > > > you want to use it you need to run it through 'make oldconfig',
> > > > with all defaults accepted)
> > >
> > > Hm, i just had a test failure (hung interface) with this too.
> > >
> > > I'll go back to the original straight revert of "303c6a0: gro: Fix
> > > legacy path napi_complete crash", and will test it overnight - to
> > > establish a baseline of stability again. (to make sure there are
> > > no other bugs interacting)
> >
> > FYI, this plain revert is holding up fine in my tests so far - 50
> > random iterations - the previous one failed after 5 iterations.
>
> Something must be up with respect to letting interrupts in during
> certain windows of time, or similar.
>
> I'll take a look at this and hopefully Herbert or myself will be
> able to figure it out.

It definitely did not show usual patterns of bug behavior - i'd have
found it yesterday morning if it did.

I spent most of the time trying to find a reliable reproducer
.config and system. Sometimes the bug went away with a minor change
in the .config. Until today i didnt even suspect a mainline change
causing this.

Also, note that i have reduced the probability of UP kernels in my
randconfigs artificially to about 12.5% (it is 50% upstream). Still,
despite that measure, the 'best' .config i found was an UP config -
i dont think that's an accident. Also, i had to fully saturate the
target CPU over gigabit to hit the bug best.

Which suggests to me (empirically) that it's indeed a race and that
it needs a saturated system with lots of IRQs to trigger, and
perhaps that it needs saturated/overloaded network device queues and
complex userspace/softirq/hardirq interactions.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/