Re: 2.6.26-rc6-git2: Reported regressions from 2.6.25

From: Linus Torvalds
Date: Sat Jun 14 2008 - 20:43:10 EST




On Sat, 14 Jun 2008, David Miller wrote:

> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Sat, 14 Jun 2008 14:42:05 -0700 (PDT)
>
> > On Sat, 14 Jun 2008, Rafael J. Wysocki wrote:
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10908
> > > Subject : IPF Montvale machine panic when running a network-relevent testing
> > > Submitter : Zhang, Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx>
> > > Date : 2008-06-13 8:19 (2 days old)
> > > References : http://marc.info/?l=linux-kernel&m=121334523711437&w=4
> >
> > I think this got fixed by ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5: "tcp:
> > Revert 'process defer accept as established' changes".
>
> No, this is looking like a different bug.

Are you sure? Because that revert seems to basically revert all changes
since 2.6.25 in tcp_rcv_established(), which is the function that oopses.
After that revert, the function is back to exactly what it used to be.

Of course, inlining makes it less obvious what other changes end up doing,
but even the offset in the function (not quite at the very end of it, but
not that far off that end either) matches where you'd expect that that
'tcp_defer_accept_check()' thing used to be before the revert.

Also: see the report saying

"As a matter of fact, kernel paniced at statement
"queue->rskq_accept_tail->dl_next = req" in function reqsk_queue_add,
because queue->rskq_accept_tail is NULL. The call chain is:
tcp_rcv_established => inet_csk_reqsk_queue_add => reqsk_queue_add."

and realize that that whole inet_csk_reqsk_queue_add() call only exists
in that tcp_defer_accept_check() thing that no longer exists.

IOW, I'm pretty damn sure that the bug entry above is very much a result
of the tcp_defer_accept_check() thing, and that commit ec0a196626 fixed
it by reverting it.

> The behavior of that bug would not usually be a crash, but
> rather stuck connections, and I severely doubt anything in
> that specweb test setup is using the deferred-accept option
> which is a requirement for hitting those problems.

Hey, I might be wrong. But see above. I don't think I am. I think the
deferred-accept was just even buggier than you believed.

But who knows.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/