Re: connection failure after "tcp: remove max_qlen_log"

From: Eric Dumazet
Date: Sat Jan 23 2016 - 19:12:07 EST


On Sat, 2016-01-23 at 15:54 -0800, Kui Zhang wrote:
> Hello,
>
> One of our java software would not start after this:
>
> # first bad commit: [ef547f2ac16bd9d77a780a0e7c70857e69e8f23f] tcp:
> remove max_qlen_log
>
>
> Log from software:
>
> STATUS | monitor | 2016/01/23 14:22:55 | Launching a Service...
> INFO | buserver | 2016/01/23 14:23:00 | WrapperManager class
> initialized by thread: main Using classloader:
> sun.misc.Launcher$AppClassLoader@2876b359
> INFO | buserver | 2016/01/23 14:23:00 |
> INFO | buserver | 2016/01/23 14:23:00 | Wrapper Manager: JVM #1
> INFO | buserver | 2016/01/23 14:23:00 | Running a 64-bit JVM.
> INFO | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Registering
> shutdown hook
> INFO | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Using wrapper
> INFO | buserver | 2016/01/23 14:23:00 | Load native library. One or
> more attempts may fail if platform specific libraries do not exist.
> INFO | buserver | 2016/01/23 14:23:00 | Loading native library
> failed: libwrapper-linux-x86-64.so Cause:
> java.lang.UnsatisfiedLinkError: no wrapper-linux-x86-64 in
> java.library.path
> INFO | buserver | 2016/01/23 14:23:00 | Loaded native library: libwrapper.so
> INFO | buserver | 2016/01/23 14:23:00 | Calling native initialization method.
> INFO | buserver | 2016/01/23 14:23:00 | Inside native WrapperManager
> initialization method
> INFO | buserver | 2016/01/23 14:23:00 | Java Version :
> 1.7.0_80-b15 Java HotSpot(TM) 64-Bit Server VM
> INFO | buserver | 2016/01/23 14:23:00 | Java VM Vendor : Oracle Corporation
> INFO | buserver | 2016/01/23 14:23:00 |
> INFO | buserver | 2016/01/23 14:23:00 |
> WrapperManager.start(com.r1soft.backup.server.BUServerWrapper@7e515f40,
> args["wait"]) called by thread: main
> INFO | buserver | 2016/01/23 14:23:00 | Open socket to
> wrapper...Wrapper-Connection
> ERROR | monitor | 2016/01/23 14:23:24 | Startup failed: Timed out
> waiting for a signal from the Service.
>
>
> ### normal output around this point
> ### INFO | buserver | 2016/01/23 14:30:57 | Opened Socket from 31000 to 32000
>
>
> ADVICE | monitor | 2016/01/23 14:23:24 |
> ADVICE | monitor | 2016/01/23 14:23:24 |
> ------------------------------------------------------------------------
> ADVICE | monitor | 2016/01/23 14:23:24 | Advice:
> ADVICE | monitor | 2016/01/23 14:23:24 | The Wrapper consists of a
> native component as well as a set of classes
> ADVICE | monitor | 2016/01/23 14:23:24 | which run within the Service
> that it launches. The Java component of the
> ADVICE | monitor | 2016/01/23 14:23:24 | Wrapper must be initialized
> promptly after the Service is launched or the
> ADVICE | monitor | 2016/01/23 14:23:24 | Wrapper will timeout, as
> just happened. Most likely the main class
> ADVICE | monitor | 2016/01/23 14:23:24 | specified in the Wrapper
> configuration file is not correctly initializing
> ADVICE | monitor | 2016/01/23 14:23:24 | the Wrapper classes:
> ADVICE | monitor | 2016/01/23 14:23:24 | Main
> ADVICE | monitor | 2016/01/23 14:23:24 | While it is possible to do
> so manually, the Wrapper ships with helper
> ADVICE | monitor | 2016/01/23 14:23:24 | classes to make this
> initialization processes automatic.
> ADVICE | monitor | 2016/01/23 14:23:24 | Please review the
> integration section of the Wrapper's documentation
> ADVICE | monitor | 2016/01/23 14:23:24 | for the various methods
> which can be employed to launch an application
> ADVICE | monitor | 2016/01/23 14:23:24 | within the Wrapper:
> ADVICE | monitor | 2016/01/23 14:23:24 |
> http://wrapper.tanukisoftware.org/doc/english/integrate.html
> ADVICE | monitor | 2016/01/23 14:23:24 |
> ------------------------------------------------------------------------
> ADVICE | monitor | 2016/01/23 14:23:24 |
> ERROR | monitor | 2016/01/23 14:23:24 | JVM did not exit on request,
> terminated
> DEBUG | monitor | 2016/01/23 14:23:24 | Signal trapped. Details:
> DEBUG | monitor | 2016/01/23 14:23:24 | signal number=17
> (SIGCHLD), source="unknown"
> DEBUG | monitor | 2016/01/23 14:23:24 | Received SIGCHLD, calling wait().
> DEBUG | monitor | 2016/01/23 14:23:24 | wait() returned, child
> process should be gone.
> STATUS | monitor | 2016/01/23 14:23:24 | Service Restarts disabled.
> Shutting down.
> STATUS | monitor | 2016/01/23 14:23:24 | <-- Monitor Stopped
>
>
>
> strace snippet:
>
> 10685 nanosleep({0, 100000000}, <unfinished ...>
> 10684 <... nanosleep resumed> NULL) = 0
> 10684 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
> 10684 fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> 10684 bind(4, {sa_family=AF_INET, sin_port=htons(32000),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> 10684 stat("/etc/localtime", {st_dev=makedev(9, 2), st_ino=1828322,
> st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
> st_blocks=8, st_size=2875, st_atime=2015/10/13-11:06:06,
> st_mtime=2015/10/13-11:06:06, st_ctime=2015/10/13-11:06:06}) = 0
> 10684 listen(4, 0) = 0


listen(fd, 0) ?

> 10684 accept(4, 0x7ffcae409990, [16]) = -1 EAGAIN (Resource
> temporarily unavailable)
...
> 10684 accept(4, 0x7ffcae409990, [16]) = -1 EAGAIN (Resource
> temporarily unavailable)
...

It seems the bug is in user space program.

Use listen(fd, 10) or listen(fd, 1000) ?