Re: [PATCH 3/3] shmem: Add support for using full width of ino_t

From: Amir Goldstein
Date: Fri Dec 27 2019 - 10:26:16 EST


On Fri, Dec 27, 2019 at 4:30 PM Chris Down <chris@xxxxxxxxxxxxxx> wrote:
>
> The new inode64 option now uses get_next_ino_full, which always uses the
> full width of ino_t (as opposed to get_next_ino, which always uses
> unsigned int).
>
> Using inode64 makes inode number wraparound significantly less likely,
> at the cost of making some features that rely on the underlying
> filesystem not setting any of the highest 32 bits (eg. overlayfs' xino)
> not usable.

That's not an accurate statement. overlayfs xino just needs some high
bits available. Therefore I never had any objection to having tmpfs use
64bit ino values (from overlayfs perspective). My only objection is to
use the same pool "irresponsibly" instead of per-sb pool for the heavy
users.

>
> Signed-off-by: Chris Down <chris@xxxxxxxxxxxxxx>
> Reported-by: Phyllipe Medeiros <phyllipe@xxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Amir Goldstein <amir73il@xxxxxxxxx>
> Cc: Jeff Layton <jlayton@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: kernel-team@xxxxxx
> ---
> include/linux/shmem_fs.h | 1 +
> mm/shmem.c | 41 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index de8e4b71e3ba..d7727d0d687f 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -35,6 +35,7 @@ struct shmem_sb_info {
> unsigned char huge; /* Whether to try for hugepages */
> kuid_t uid; /* Mount uid for root directory */
> kgid_t gid; /* Mount gid for root directory */
> + bool small_inums; /* i_ino width unsigned int or ino_t */
> struct mempolicy *mpol; /* default memory policy for mappings */
> spinlock_t shrinklist_lock; /* Protects shrinklist */
> struct list_head shrinklist; /* List of shinkable inodes */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ff041cb15550..56cf581ec66d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -115,11 +115,13 @@ struct shmem_options {
> kuid_t uid;
> kgid_t gid;
> umode_t mode;
> + bool small_inums;
> int huge;
> int seen;
> #define SHMEM_SEEN_BLOCKS 1
> #define SHMEM_SEEN_INODES 2
> #define SHMEM_SEEN_HUGE 4
> +#define SHMEM_SEEN_INUMS 8
> };
>
> #ifdef CONFIG_TMPFS
> @@ -2248,8 +2250,12 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
> inode = new_inode(sb);
> if (inode) {
> /* Recycle to avoid 32-bit wraparound where possible */
> - if (!inode->i_ino)
> - inode->i_ino = get_next_ino();
> + if (!inode->i_ino) {
> + if (sbinfo->small_inums)
> + inode->i_ino = get_next_ino();
> + else
> + inode->i_ino = get_next_ino_full();
> + }

Ouch! You set yourself a trap in patch #1 and stepped into it here.
shmem driver has a single shmem_inode_cachep serving all tmpfs
instances. You cannot use different ino allocators and recycle ino
numbers from the same inode cache pool.
Sorry I did not see it coming...

I'm afraid that the recycling method cannot work along side a per-sb
ino allocator :/ (unless get_next_ino() was a truncated version of the
same counter as get_next_ino_full()).

You could still apply the recycling method if shmem inode cache was
never tainted by any other ino allocator, but that will lead to
unpredictable results for users and quite hard to document behavior.

IOW, I don't see a decent way out besides making shmem ino
allocator per-sb proper and always *before* adding the per-sb mount
option inode64.

And because of this, I think the global get_next_ino_full() API is
a mistake.

Thanks,
Amir.