of staroffice 5.0, linux 2.2 and select()

Janos Farkas (chexum@shadow.banki.hu)
Fri, 2 Apr 1999 20:51:45 +0200


Hi!

Although I'm not sure many people are using StarOffice here, I've run
into something which could be (at least partly) caused by the kernel, so
I'm asking for help here.

The short story: StarOffice 5.01 behaves erratically under 2.2.x, this
starts to be a well known fact, but noone knows why, even the developers
are silent about this (although I've not yet asked; they're just another
faceless corporation for me..). This "erratically" means everything
goes usually right, until you initiate something which involves a TCP
connection; like email reading (via POP3), sending (via SMTP), or simply
trying to browse. Then, in about 30-70% of the cases, it never manages
to do the "network" operation. (I.e. clicking on the POP3 inbox just
pops up the window, a red progress lamp lights up, and nothing happens.
You could stop it though, but no similar network operation will succeed
until you restart SO). [In most of my experiments, the "network" was
the local host itself, so no TCP wizardry needed to read further :)]

It seems it's somehow related to startup: when StarOffice "sees" the
network, it will see until you quit it; when it's failing, it will fail
too. It's pretty random, but usually mostly doesn't work.

All this with any 2.2.x kernel, and not with any 2.0.x kernels.

Now, I got the stomach, and tried the longish binary search, my findings
are: it's everything perfect until 2.2.0-pre7, and the current erratic
behavior is in since 2.2.0-pre8 (and in all 2.2.x). This is obviously
not enough data to gather debuggers :) So I went further. 2.2.0-pre8
was a quite sizable patch, bringing in a few new drivers, a few
non-intel patches, brought back the AVL trees for memory handling, very
few TCP behavior changes; these are suspicious, but innocent, since it
most probably won't affect local connections. And another thing:
bringing in the "scalable" version of select/poll. And indeed, that's
it.

Again, pre-7 works fine with SO; pre-8 doesn't. And pre-8 works, if I
revert the the files: fs/select.c and include/linux/poll.h to their
pre-7 state.

Now, here's my plea for help, I don't know how to go on. I'm gazing at
these changes for a day now, and can't see any "substantial" behavior
change. Sure, it was made a bit cleaner (hah :), but it still seems to
do the same thing... And let me repeat, only the changes to these two
files are enough to make SO behave badly with the net.

Alan? A few months ago Bill Hawes would be the man to pinpoint a few
hours later "try this patch, there was a race in that code although I'm
not sure how could you bump into it", but now he's swallowed by that
secret lab in CA :)

The short story of course doesn't mention that SO is multithreaded,
comes with a "known-working" glibc version; and that's it's impossible
to strace into threads, nor with strace -f at the beginning, nor with
strace -p once they are running, and all that minor difficulties...

Janos

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/