Re: [PATCH v2] inotify: Increase default inotify.max_user_watches limit to 1048576

From: Waiman Long
Date: Thu Oct 29 2020 - 14:10:02 EST


On 10/29/20 1:27 PM, Amir Goldstein wrote:
On Thu, Oct 29, 2020 at 5:46 PM Waiman Long <longman@xxxxxxxxxx> wrote:
The default value of inotify.max_user_watches sysctl parameter was set
to 8192 since the introduction of the inotify feature in 2005 by
commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
small for many modern usage. As a result, users have to explicitly set
it to a larger value to make it work.

After some searching around the web, these are the
inotify.max_user_watches values used by some projects:
- vscode: 524288
- dropbox support: 100000
- users on stackexchange: 12228
- lsyncd user: 2000000
- code42 support: 1048576
- monodevelop: 16384
- tectonic: 524288
- openshift origin: 65536

Each watch point adds an inotify_inode_mark structure to an inode to
be watched. It also pins the watched inode as well as an inotify fdinfo
procfs file.

Modeled after the epoll.max_user_watches behavior to adjust the default
value according to the amount of addressable memory available, make
inotify.max_user_watches behave in a similar way to make it use no more
than 1% of addressable memory within the range [8192, 1048576].

For 64-bit archs, inotify_inode_mark plus 2 inode have a size close
to 2 kbytes. That means a system with 196GB or more memory should have
the maximum value of 1048576 for inotify.max_user_watches. This default
should be big enough for most use cases.

With my x86-64 config, the size of xfs_inode, proc_inode and
inotify_inode_mark is 1680 bytes. The estimated INOTIFY_WATCH_COST is
1760 bytes.

[v2: increase inotify watch cost as suggested by Amir and Honza]

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
fs/notify/inotify/inotify_user.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 186722ba3894..37d9f09c226f 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -37,6 +37,16 @@

#include <asm/ioctls.h>

+/*
+ * An inotify watch requires allocating an inotify_inode_mark structure as
+ * well as pinning the watched inode and adding inotify fdinfo procfs file.
Maybe you misunderstood me.
There is no procfs file per watch.
There is a procfs file per inotify_init() fd.
The fdinfo of that procfile lists all the watches of that inotify instance.
Thanks for the clarification. Yes, I probably had misunderstood you because of the 2 * sizeof(inode) figure you provided.
+ * The increase in size of a filesystem inode versus a VFS inode varies
+ * depending on the filesystem. An extra 512 bytes is added as rough
+ * estimate of the additional filesystem inode cost.
+ */
+#define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \
+ 2 * sizeof(struct inode) + 512)
+
I would consider going with double the sizeof inode as rough approximation for
filesystem inode size.

It is a bit less arbitrary than 512 and it has some rationale behind it -
Some kernel config options will grow struct inode (debug, smp)
The same config options may also grow the filesystem part of the inode.

And this approximation can be pretty accurate at times.
For example, on Ubuntu 18.04 kernel 5.4.0:
inode_cache 608
nfs_inode_cache 1088
btrfs_inode 1168
xfs_inode 1024
ext4_inode_cache 1096

Just to clarify, is your original 2 * sizeof(struct inode) figure include the filesystem inode overhead or there is an additional inode somewhere that I needs to go to 4 * sizeof(struct inode)?

Cheers,
Longman