Re: Dual opteron various segfaults with 2.6.14.2 and earlier kernels

From: Fabio Coatti
Date: Wed Nov 23 2005 - 18:26:28 EST


Alle 23:55, mercoledì 23 novembre 2005, Andrew Walrond ha scritto:
> On Wednesday 23 November 2005 14:37, Fabio Coatti wrote:
> > Hi all,
> > I'm seeing several segfaults on a couple of HP DL585 Dual Opterons, 8Gb
> > ram each.
> >
> > The segfaults are like this:
> > factorial[17031]: segfault at 0000000000020f31 rip 00000000004035ae rsp
> > 00007fffffe287e0 error 4
> > factorial[17034]: segfault at 0000000000020f31 rip 00000000004035ae rsp
> > 00007fffffc6f450 error 4
> > factorial[17038]: segfault at 0000000000020f31 rip 00000000004035ae rsp
> > 00007fffffdbd060 error 4
> > factorial[17044]: segfault at 0000000000020f31 rip 00000000004035ae rsp
> > 00007fffffb48fa0 error 4
> > factorial[17046]: segfault at 0000000000020f31 rip 00000000004035ae rsp
> > 00007fffffc2a7f0 error 4
> > ld[3997]: segfault at 0000000000000020 rip 00002aaaaad1a525 rsp
> > 00007fffffa8e960 error 4
> > ld[4234]: segfault at 0000000000000020 rip 00002aaaaad1a525 rsp
> > 00007fffffc3a1e0 error 4
> >
> > This is only an example; often during some "make", also sed segfaults
> > (!). I've seen this with 2.6.12, 2.6.13.4, 2.6.14.2
>
> The symtoms look just like the TLB flush filter errata which affected SMP
> x86_64 kernels upto (at least) 2.6.13.4. IIRC it was fixed for 2.6.14 (at
> least I stopped using the patch after 2.6.13.4).
>
> Are you sure you saw this with 2.6.14+ ?


yes, uname says 2.6.14.2; on a second identical machine, I've just seen this:


factorial[2352]: segfault at 0000000000020f31 rip 00000000004035ae rsp
00007fffffbfaf60 error 4
factorial[2354]: segfault at 0000000000020f31 rip 00000000004035ae rsp
00007fffffe3fc70 error 4
factorial[2361]: segfault at 0000000000020f31 rip 00000000004035ae rsp
00007fffffb07c50 error 4
factorial[2358]: segfault at 0000000000020f31 rip 00000000004035ae rsp
00007fffffb07c50 error 4
factorial[2363]: segfault at 0000000000020f31 rip 00000000004035ae rsp
00007fffffe6d270 error 4

the kernel and HW are the same.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/