Re: [PATCH] iversion: update comments with info about atime updates

From: NeilBrown
Date: Tue Aug 23 2022 - 18:26:09 EST


On Tue, 23 Aug 2022, Jeff Layton wrote:
> On Tue, 2022-08-23 at 21:38 +1000, NeilBrown wrote:
> > On Tue, 23 Aug 2022, Jeff Layton wrote:
> > > So, we can refer to that and simply say:
> > >
> > > "If the function updates the mtime or ctime on the inode, then the
> > > i_version should be incremented. If only the atime is being updated,
> > > then the i_version should not be incremented. The exception to this rule
> > > is explicit atime updates via utimes() or similar mechanism, which
> > > should result in the i_version being incremented."
> >
> > Is that exception needed? utimes() updates ctime.
> >
> > https://man7.org/linux/man-pages/man2/utimes.2.html
> >
> > doesn't say that, but
> >
> > https://pubs.opengroup.org/onlinepubs/007904875/functions/utimes.html
> >
> > does, as does the code.
> >
>
> Oh, good point! I think we can leave that out. Even better!

Further, implicit mtime updates (file_update_time()) also update ctime.
So all you need is
If the function updates the ctime, then i_version should be
incremented.

and I have to ask - why not just use the ctime? Why have another number
that is parallel?

Timestamps are updated at HZ (ktime_get_course) which is at most every
millisecond.
xfs stores nanosecond resolution, so about 20 bits are currently wasted.
We could put a counter like i_version in there that only increments
after it is viewed, then we can get all the precision we need but with
exactly ctime semantics.

The 64 change-id could comprise
35 bits of seconds (nearly a millenium)
16 bits of sub-seconds (just in case a higher precision time was wanted
one day)
13 bits of counter. - 8192 changes per tick

The value exposed in i_ctime would hide the counter and just show the
timestamp portion of what the filesystem stores. This would ensure we
never get changes on different files that happen in one order leaving
timestamps with the reversed order (the timestamps could be the same,
but that is expected).

This scheme could be made to handle a sustained update rate of 1
increment every 8 nanoseconds (if the counter were allowed to overflow
into unused bits of the sub-second field). This is one ever 24 CPU
cycles. Incrementing a counter and making it visible to all CPUs can
probably be done in 24 cycles. Accessing it and setting the "seen" flag
as well might just fit with faster memory. Getting any other useful
work done while maintaining that rate on a single file seems unlikely.

NeilBrown