Re: [PATCH] nfs: set block size according to pnfs_blksize first

From: Trond Myklebust
Date: Wed Jun 16 2021 - 14:51:44 EST


On Thu, 2021-06-17 at 01:51 +0800, Gao Xiang wrote:
> Hi Ted,
>
> On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote:
> > On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote:
> > > > > Yet my question is how to deal with generic/486, should we
> > > > > just skip
> > > > > the case directly? I cannot find some proper way to get
> > > > > underlayfs
> > > > > block size or real xattr value limit.
> >
> > Note that the block size does not necessarily have thing to do with
> > the xattr value limit.  For example, in ext4, if you create the
> > file
> > system with the ea_inode feature enabled, you can create extended
> > attributes up to the maximum value of 64k defined by the xattr
> > interface --- unless, of course, there isn't enough space in the
> > file
> > system.
> >
> > (The ea_inode feature will also use shared space for large xattrs,
> > so
> > if you are storing hundreds of thousands of files that all have an
> > identical 20 kilbyte Windows security id or ACL, ea_inode is going
> > to
> > be much more efficient way of supporting that particular use case.)
>
> Thanks for your detailed explanation too.
>
> Yeah, yet it's not enabled in the issue setup I was assigned :(
>
> >
> > > > As noted above, the NFS server should really be returning
> > > > NFS4ERR_XATTR2BIG in this case, which the client, again, should
> > > > be
> > > > transforming into -E2BIG. Where does ENOSPC come from?
> > >
> > > Thanks for the detailed explanation...
> > >
> > > I think that is due to ext4 returning ENOSPC since I tested
> >
> > It's not just ext4.  From inspection, under some circumstances f2fs
> > and btrfs can return ENOSPC.
>
> I did some test on ext4 only earlier since it's our test environment.
> I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry
> about
> that if some misunderstanding here )
>
> >
> > The distinction is that E2BIG is (from the attr_set man page):
> >
> >        [E2BIG]          The value of the given attribute is too
> > large,
> >                         it exceeds the maximum allowable size of an
> >                         attribute value.
> >
> > The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE,
> > which is 65536 bytes.  Some file systems may impose a smaller max
> > allowable size.
> >
> > In contrast ENOSPC means something different:
> >
> >        ENOSPC           No space left on device.
> >
> > This isn't necessarily just block space, BTW; it might some other
> > file
> > system space --- thre might not be a free inode in the case of
> > ext4's
> > ea_inode feature.  Or it be the f2fs file system not being able to
> > allocate a node id via f2fs_alloc_nid().
> >
> > Note that generic/486 is testing a very specific case, which is
> > replacing a small xattr (16 bytes) with an xattr with a large
> > value.
> > This is would be a really interesting test for ext4 ea_inode, when
> > we
> > are replacing an xattr stored inline in the inode, or in a single
> > 4k
> > block, with an xattr stored in a separate inode.  But not the way
> > src/attr_replace_test.c (which does all of the heavy lifting for
> > the
> > generic/486 test) is currently written.
> >
>
> Yeah, as for the original case, it tried to test when it turned into
> the XFS extents format, but I'm not sure if it's just the regression
> test for such report only (similiar to ext4 inline xattr to non-
> inline
> xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or
> ea_inode
> case...
>
> > So what I would suggest is to have attr_replace_test.c try to
> > determine the maximum attr value size using binary search.  Start
> > with
> > min=16, and max=65536, and try creating an xattr of a particular
> > size,
> > and then delete the xattr, and then retry creating the xattr with
> > the
> > next binary search size.  This will allow you to create a function
> > which determines the maximum size attr for a particular file
> > system,
> > especially in those cases where it is dependent on how the file
> > system
> > is configured.
>
> Considering the original XFS regression report [1], I think
> underlayfs blksize may still be needed. And binary search to get the
> maximum attr value may be another new case for this as well. Or
> alternatively just add by block size to do a trip test.
>
> Although I have no idea if we can just skip the case when testing
> with
> NFS. If getting underlayfs blksize is unfeasible, I think we might
> skip such case for now since nfs blksize is not useful for
> generic/486.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119

I'm still not seeing why we care about block sizes, and I'm certainly
not seeing why we should care about them in the NFS case.

xattrs are a form of key-value storage with atomic semantics (i.e. when
you attempt to get/set/update/remove them, then the operation either
succeeds or it fails without any side-effects).
There is no interface for doing any form of block I/O to an xattr. 
There is no requirement in the documentation that the user know
anything about block size in order to use them.

In other words, if this xfstest 'generic/486' is depending on knowledge
of the block size, then it is fundamentally a filesystem
implementation-specific test. It doesn't belong in the 'generic' test
category, because it is not testing anything that is generic to xattrs.

>
> Thanks,
> Gao Xiang
>
> >
> > > should we transform it to E2BIG instead (at least in NFS
> > > protocol)? but I'm still not sure that E2BIG is a valid return
> > > code for
> > > setxattr()...
> >
> > E2BIG is defined in the attr_set(3) man page.  ENOSPC isn't
> > mentioned
> > in the attr_set man page, but given that multiple file systems
> > return
> > ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very
> > much makes sense when creating extended attributes, we should fix
> > that
> > by adding ENOSPC to the attr_set(3) man page (which is shipped as
> > part
> > of the libattr library sources).
> >
> > Cheers,
> >
> >                                                 - Ted

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx