Re: futexes: Still infinite loop in get_futex_key() in 2.6.31-rc4

From: Jens Rosenboom
Date: Mon Jul 27 2009 - 08:46:19 EST


On Mon, 2009-07-27 at 14:23 +0200, Peter Zijlstra wrote:
> On Mon, 2009-07-27 at 14:16 +0200, Jens Rosenboom wrote:
> > On Mon, 2009-07-27 at 13:31 +0200, Peter Zijlstra wrote:
> > > On Mon, 2009-07-27 at 10:00 +0200, Jens Rosenboom wrote:
> > > > We have a problem with infinitely running processes on kernels at least
> > > > since 2.6.29.4. It happens on a loaded machine after running for a
> > > > couple of days,
> > >
> > > What kinds of machine, i386? Could you please enable
> > > CONFIG_FRAME_POINTER, these backtraces are quite mangled.
> >
> > i686 or AMD dualcore Opteron to be exact. CONFIG_FRAME_POINTER is
> > enabled, the complete kernel-config is attached, maybe some other
> > debugging options are needed? But I copied just the part pertaining to
> > the stuck process, maybe the complete log has the parts you are missing?
>
> Ah, weird. The question of course is, does an x86_64 kernel suffer the
> same problem?

Good question, but as this happens on a production machine, I cannot
easily change the installation to check this.

> > > > that a "ps ax" seems to get stuck in get_futex_key while
> > > > exiting. Sadly your patch
> > >
> > > Who's patch, and which patch? 7c8fa4f04ab956076605422d5ed37410893a8a73?
> > > That was only regarding huge pages.
> >
> > Yes, that is the one I was talking about and the commit message seemed
> > to match what I was seeing here.
>
> Are you in fact using huge pages?

The process that gets stuck is a standard ps from procps version 3.2.8,
which is called from within a perl script, so the answer is probably:
no. Which means let us forget that patch and look at this as a distinct
issue.

> > > The only loop in get_futex_key() appears to be the one around
> > > get_user_pages_fast(), and I'm not quite sure how that could get stuck
> > > like this.
> > >
> > > Could it be glibc loops on futex_wake() returning -EFAULT?
> >
> > How would I be able to check that?
>
> strace the struck process I think, you'd see tons of sys_futex() calls
> with FUTEX_WAKE* returning -EFAULT.

Attaching an strace to the process gives just

# strace -p 12886
Process 12886 attached - interrupt to quit

and nothing further.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/