Re: [PATCH RFC 6/7] libfs: Convert simple directory offsets to use a Maple Tree
From: Chuck Lever
Date: Thu Feb 15 2024 - 08:46:34 EST
On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote:
> On Tue 13-02-24 16:38:01, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >
> > Test robot reports:
> > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on:
> > >
> > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > Feng Tang further clarifies that:
> > > ... the new simple_offset_add()
> > > called by shmem_mknod() brings extra cost related with slab,
> > > specifically the 'radix_tree_node', which cause the regression.
> >
> > Willy's analysis is that, over time, the test workload causes
> > xa_alloc_cyclic() to fragment the underlying SLAB cache.
> >
> > This patch replaces the offset_ctx's xarray with a Maple Tree in the
> > hope that Maple Tree's dense node mode will handle this scenario
> > more scalably.
> >
> > In addition, we can widen the directory offset to an unsigned long
> > everywhere.
> >
> > Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@xxxxxxxxx
> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>
> OK, but this will need the performance numbers.
Yes, I totally concur. The point of this posting was to get some
early review and start the ball rolling.
Actually we expect roughly the same performance numbers now. "Dense
node" support in Maple Tree is supposed to be the real win, but
I'm not sure it's ready yet.
> Otherwise we have no idea
> whether this is worth it or not. Maybe you can ask Oliver Sang? Usually
> 0-day guys are quite helpful.
Oliver and Feng were copied on this series.
> > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry)
> > if (!inode || !S_ISDIR(inode->i_mode))
> > return ret;
> >
> > - index = 2;
> > + index = DIR_OFFSET_MIN;
>
> This bit should go into the simple_offset_empty() patch...
>
> > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
> >
> > /* In this case, ->private_data is protected by f_pos_lock */
> > file->private_data = NULL;
> > - return vfs_setpos(file, offset, U32_MAX);
> > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE);
> ^^^
> Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem
> quite right? Why not use ULONG_MAX here directly?
I initially changed U32_MAX to ULONG_MAX, but for some reason, the
length checking in vfs_setpos() fails. There is probably a sign
extension thing happening here that I don't understand.
> Otherwise the patch looks good to me.
As always, thank you for your review.
--
Chuck Lever