Re: inotify_add_watch() returning ENOSPC in 2.6.24 [watch descriptor leak?]

From: Clem Taylor
Date: Wed Feb 06 2008 - 14:41:19 EST


On Feb 6, 2008 4:51 AM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 5 Feb 2008 20:49:42 -0500 "Clem Taylor" <clem.taylor@xxxxxxxxx> wrote:
> > I'm trying to move a MIPS based embedded system from 2.6.16.16 to
> > 2.6.24. Most things seem to be working, but I'm having troubles with
> > inotify. The code is using inotify to detect a file written to /tmp
> > (tmpfs). The writer creates a file with a temporary name and then
> > rename()s the tmp file over the file I'm monitoring.
> >
> > With 2.6.16.16, everything works fine, but with 2.6.24, the inotify
> > process runs for a while (~100 events) and then inotify_add_watch()
> > returns ENOSPC. Once this happens, I can't add new watches, even if I
> > kill the process and restart it. fs.inotify.max_user_instances and
> > fs.inotify.max_user_watches are both 128, so I'd imagine I'm hitting
> > this limit. For some reason the watches aren't getting cleaned up
> > (even after the process is killed).

> Good bug report, thanks. That code was significantly altered in June 2006
> and perhaps something broke.

I also tested on a 2.6.20 x86 desktop machine. It took ~8k iterations
to fail, which matched max_user_watches. Once the program fails, it
will fail right away if it is re-run.

> It's a bit hard to find people who work on inotify, I'm afraid. If you had
> the time to come up with a script or program which demonstrates the bug,
> that would be super-helpful?

Attached is a simple example that shows off the problem. On a system
with a problem, it will only run for about
fs.inotify.max_user_watches iterations. If everything is working, it
should run forever.

Thanks,
Clem
/* Inotify IN_ONESHOT leak?
*
* This program loops on creating oneshot inotify watches, triggering a close
* write event and then waiting for the event. On 2.6.16.16 this works just
* fine. When I moved to 2.6.24, this code fails after ~100 events.
* fs.inotify.max_user_instances and fs.inotify.max_user_watches are both 128,
* so I'd imagine I am hitting this limit.
*
* After killing and restarting the problem, it will fail right away and only
* a reboot will recover.
*
* This also fails on a desktop machine with 2.6.20. It took ~8k iterations
* to fail, which matches the larger max_user_watches.
*
* Compile with:
* gcc -Wall -o inotifyLeak inotifyLeak.c
*
* Worked in 2.6.16.16 [mipsel]
* Fails in 2.6.20 [Fedora x86]
* Fails in 2.6.24 [mipsel]
*/
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/inotify.h>
#include <sys/time.h>

/* makeFile(): Create a close write event for inotify to detect. */
int makeFile ( const char *filename )
{
FILE *file;
struct timeval tv;

gettimeofday ( &tv, NULL );

file = fopen ( filename, "w" );
if ( file == NULL )
{
fprintf ( stderr, "Failed to open \"%s\" for writing: %s\n",
filename, strerror ( errno ) );
return -1;
}

fprintf ( file, "%u.%06d\n", (unsigned int) tv.tv_sec, (int) tv.tv_usec );
fclose ( file );

return 0;
}

int main ( int argc, char *argv[] )
{
const char filename[] = "/tmp/inotifyLeak.test";
struct inotify_event event;
int notifyFD, wd, ret, i;

if ( ( notifyFD = inotify_init() ) < 0 )
{
fprintf ( stderr, "inotify_init() failed: %s\n", strerror ( errno ) );
return 1;
}

/* create initial file */
makeFile ( filename );

for ( i = 0 ; ; i++ )
{
/* create a one shot event */
wd = inotify_add_watch ( notifyFD, filename,
IN_CLOSE_WRITE | IN_DELETE_SELF | IN_ONESHOT );
if ( wd < 0 )
{
/* this is the failure case */
fprintf ( stderr, "inotify_add_watch() failed: %s [i=%d]\n",
strerror ( errno ), i );
return 1;
}

/* create an event on the file */
makeFile ( filename );

/* blocking read, waiting for event */
ret = read ( notifyFD, &event, sizeof(event) );
if ( ret < 0 )
{
fprintf ( stderr, "inotify read() failed: %s\n",
strerror ( errno ) );
return 1;
}
else if ( ret != sizeof(event) )
{
fprintf ( stderr, "inotify read() returned %d not %d\n",
ret, sizeof(event) );
return 1;
}
else if ( event.wd != wd )
{
fprintf ( stderr, "Watch mismatch, expected %d, got %d\n",
wd, event.wd );
return 1;
}

/* if we attempt to call inotify_rm_watch(), here we get EINVAL,
* which is expected because the watch should have been deleted
* once the event is triggered.
*/

/* progress report... */
fprintf ( stderr, " %d : %d \r", i, wd );
}

return 0;
}