Re: [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes

From: Jeff Layton
Date: Sun Apr 09 2023 - 18:12:55 EST


On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote:
> On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote:
> > > > > >
> > > > > > I would ditch the original proposal in favor of this 2-line patch shown here:
> > > > > >
> > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@xxxxxxxxxxxxx/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232
> > >
> > > We should cool it with the quick hacks to fix things. :)
> > >
> >
> > Yeah. It might fix this specific testcase, but I think the way it uses
> > the i_version is "gameable" in other situations. Then again, I don't
> > know a lot about IMA in this regard.
> >
> > When is it expected to remeasure? If it's only expected to remeasure on
> > a close(), then that's one thing. That would be a weird design though.
> >
> > > > > >
> > > > > >
> > > > >
> > > > > Ok, I think I get it. IMA is trying to use the i_version from the
> > > > > overlayfs inode.
> > > > >
> > > > > I suspect that the real problem here is that IMA is just doing a bare
> > > > > inode_query_iversion. Really, we ought to make IMA call
> > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in
> > > > > the upper layer. Then overlayfs could just propagate the results from
> > > > > the upper layer in its response.
> > > > >
> > > > > That sort of design may also eventually help IMA work properly with more
> > > > > exotic filesystems, like NFS or Ceph.
> > > > >
> > > > >
> > > > >
> > > >
> > > > Maybe something like this? It builds for me but I haven't tested it. It
> > > > looks like overlayfs already should report the upper layer's i_version
> > > > in getattr, though I haven't tested that either:
> > > >
> > > > -----------------------8<---------------------------
> > > >
> > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version
> > > >
> > > > IMA currently accesses the i_version out of the inode directly when it
> > > > does a measurement. This is fine for most simple filesystems, but can be
> > > > problematic with more complex setups (e.g. overlayfs).
> > > >
> > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows
> > > > the filesystem to determine whether and how to report the i_version, and
> > > > should allow IMA to work properly with a broader class of filesystems in
> > > > the future.
> > > >
> > > > Reported-by: Stefan Berger <stefanb@xxxxxxxxxxxxx>
> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > ---
> > >
> > > So, I think we want both; we want the ovl_copyattr() and the
> > > vfs_getattr_nosec() change:
> > >
> > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That
> > > is in line what we do with all other inode attributes. IOW, the
> > > overlayfs inode's i_version counter should aim to mirror the
> > > relevant layer's i_version counter. I wouldn't know why that
> > > shouldn't be the case. Asking the other way around there doesn't
> > > seem to be any use for overlayfs inodes to have an i_version that
> > > isn't just mirroring the relevant layer's i_version.
> >
> > It's less than ideal to do this IMO, particularly with an IS_I_VERSION
> > inode.
> >
> > You can't just copy up the value from the upper. You'll need to call
> > inode_query_iversion(upper_inode), which will flag the upper inode for a
> > logged i_version update on the next write. IOW, this could create some
> > (probably minor) metadata write amplification in the upper layer inode
> > with IS_I_VERSION inodes.
>
> I'm likely just missing context and am curious about this so bear with me. Why
> do we need to flag the upper inode for a logged i_version update? Any required
> i_version interactions should've already happened when overlayfs called into
> the upper layer. So all that's left to do is for overlayfs' to mirror the
> i_version value after the upper operation has returned.

> ovl_copyattr() - which copies the inode attributes - is always called after the
> operation on the upper inode has finished. So the additional query seems odd at
> first glance. But there might well be a good reason for it. In my naive
> approach I would've thought that sm along the lines of:
>
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 923d66d131c1..8b089035b9b3 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode)
> inode->i_mtime = realinode->i_mtime;
> inode->i_ctime = realinode->i_ctime;
> i_size_write(inode, i_size_read(realinode));
> + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode));
> }
>
> would've been sufficient.
>

Nope, because then you wouldn't get any updates to i_version after that
point.

Note that with an IS_I_VERSION inode we only update the i_version when
there has been a query since the last update. What you're doing above is
circumventing that mechanism. You'll get the i_version at the time of of
the ovl_copyattr, but there won't be any updates of it after that point
because the QUERIED bit won't end up being set on realinode.


> Since overlayfs' does explicitly disallow changes to the upper and lower trees
> while overlayfs is mounted it seems intuitive that it should just mirror the
> relevant layer's i_version.
>
>
> If we don't do this, then we should probably document that i_version doesn't
> have a meaning yet for the inodes of stacking filesystems.
>

Trying to cache the i_version is counterproductive, IMO, at least with
an IS_I_VERSION inode.

The problem is that a query against the i_version has a side-effect. It
has to (atomically) mark the inode for an update on the next change.

If you try to cache that value, you'll likely end up doing more queries
than you really need to (because you'll need to keep the cache up to
date) and you'll have an i_version that will necessarily lag the one in
the upper layer inode.

The whole point of the change attribute is to get the value as it is at
this very moment so we can check whether there have been changes. A
laggy value is not terribly useful.

Overlayfs should just always call the upper layer's ->getattr to get the
version. I wouldn't even bother copying it up in the first place. Doing
so is just encouraging someone to try use the value in the overlayfs
inode, when they really need to go through ->getattr and get the one
from the upper layer.
>
--
Jeff Layton <jlayton@xxxxxxxxxx>