2.3.51: sigsegv/sigbus, possible stack corruption, large procs

From: Jim Bray (jb@as220.org)
Date: Wed Mar 15 2000 - 19:58:28 EST


  Summary: suspected bug in 2.3.51 and earlier, involving very large proc
dying with sigsegv or sigbus and a trashed-looking stack. Seems to depend
on low ram (32mb) so massive paging. Also I was using two swap areas, so
that could be a variable.

  Sorry if this actually went out: I'm reposting because it never showed
up in the archive. I joined the list, thinking that might now be required
to post, and will leave again as soon as this goes out (hey, I'm only on a
56k line!), so cc: me please.
 ...

  I have been seeing sigseg or sigbus with development kernels for some
time that I'm not seeing with 2.2.14. They only happen with large images:
netscape and mozilla are good targets. An almost surefire way to get one
is to build /src/mozilla/layout, which involves an ld that grows to 54mb.
I'm running a k6-233, with the 'stepping b' bug, which involves glitches
with >32mb, so I'm running it with 32mb. I had dismissed these faults as a
buggy chip, but the chip works fine with 2.2.14, so I thought I should
mention this. Sorry I can't be more precise, but the gist of it is a
k6-233 with stepping b bug, 32 mb sdram, large images, works fine with
2.2.14. [update: happens with k6-2/300 also].

 Testbed: check out /src/mozilla and build it, or just cd into
/src/mozilla/layout and make that, and you'll get that huge ld. Clue: I
was tracking what appeared to be a mozilla bug, but which may have been
this bug in action. Based on the appearance of the backtrace, a mozilla
hacker suspected stack corruption.
  If no one else is seeing anything like this, I'd dismiss it as some
peculiarity of my suspect hardware, which I'd do myself if it occured with
2.2.14.
....

 More info on this: I just switched from the k6-233 to a k6-2/300, and
tested with the same results, so it may well be kernel. Here is the
bt of the 61mb core file. It looks like maybe a trashed stack.

Core was generated by `/usr/bin/ld -m elf_i386 -shared -o libraptorhtml.so
/usr/lib/crti.o /usr/lib/gc'. Program terminated with signal 11,
Segmentation fault. Reading symbols from /usr/lib/libbfd-2.9.5.0.22.so...
(no debugging symbols found)...done. Reading symbols from
/lib/libdl.so.2...(no debugging symbols found)...done. Reading symbols
from /lib/libc.so.6...(no debugging symbols found)...done. Reading symbols
from /lib/ld-linux.so.2...(no debugging symbols found)...done. #0
0x4003fb79 in bfd_elf32_bfd_final_link ()
   from /usr/lib/libbfd-2.9.5.0.22.so
(gdb) bt
#0 0x4003fb79 in bfd_elf32_bfd_final_link ()
   from /usr/lib/libbfd-2.9.5.0.22.so
#1 0x4003e33d in bfd_elf32_bfd_final_link ()
   from /usr/lib/libbfd-2.9.5.0.22.so
#2 0x8058aed in bfd_link_hash_lookup ()
#3 0x805635e in bfd_link_hash_lookup ()
#4 0x40079a42 in __libc_start_main () from /lib/libc.so.6
(gdb)
....
More info: switched from 32mb sdram to 64mb sdram, and the problem goes
away. No reason to think it is bad sdram, as everything else works
fine. If it isn't some wierd hardware glitch, then it is probably
something in the vm code.

Jim http://as220.org/jb

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:18 EST