Re: [PATCH v7 12/13] ext4: switch to multigrain timestamps

From: Jeff Layton
Date: Tue Sep 19 2023 - 16:46:45 EST


On Tue, 2023-09-19 at 13:10 -0700, Paul Eggert wrote:
> On 2023-09-19 09:31, Jeff Layton wrote:
> > The typical case for make
> > timestamp comparisons is comparing source files vs. a build target. If
> > those are being written nearly simultaneously, then that could be an
> > issue, but is that a typical behavior?
>
> I vaguely remember running into problems with 'make' a while ago
> (perhaps with a BSDish system) when filesystem timestamps were
> arbitrarily truncated in some cases but not others. These files would
> look older than they really were, so 'make' would think they were
> up-to-date when they weren't, and 'make' would omit actions that it
> should have done, thus screwing up the build.
>
> File timestamps can be close together with 'make -j' on fast hosts.
> Sometimes a shell script (or 'make' itself) will run 'make', then modify
> a file F, then immediately run 'make' again; the latter 'make' won't
> work if F's timestamp is mistakenly older than targets that depend on it.
>
> Although 'make'-like apps are the biggest canaries in this coal mine,
> the issue also affects 'find -newer' (as Bruno mentioned), 'rsync -u',
> 'mv -u', 'tar -u', Emacs file-newer-than-file-p, and surely many other
> places. For example, any app that creates a timestamp file, then backs
> up all files newer than that file, would be at risk.
>
>
> > I wonder if it would be feasible to just advance the coarse-grained
> > current_time whenever we end up updating a ctime with a fine-grained
> > timestamp?
>
> Wouldn't this need to be done globally, that is, not just on a per-file
> or per-filesystem basis? If so, I don't see how we'd avoid locking
> performance issues.
>

Maybe. Another idea might be to introduce a new timekeeper for
multigrain filesystems, but all of those would likely have to share the
same coarse-grained clock source.

So yeah, if you stat an inode and then update it, any inode written on a
multigrain filesystem within the same jiffy-sized window would have to
log an extra transaction to write out the inode. That's what I meant
when I was talking about write amplification.

>
> PS. Although I'm no expert in the Linux inode code I hope you don't mind
> my asking a question about this part of inode_set_ctime_current:
>
> /*
> * If we've recently updated with a fine-grained timestamp,
> * then the coarse-grained one may still be earlier than the
> * existing ctime. Just keep the existing value if so.
> */
> ctime.tv_sec = inode->__i_ctime.tv_sec;
> if (timespec64_compare(&ctime, &now) > 0)
> return ctime;
>
> Suppose root used clock_settime to set the clock backwards. Won't this
> code incorrectly refuse to update the file's timestamp afterwards? That
> is, shouldn't the last line be "goto fine_grained;" rather than "return
> ctime;", with the comment changed from "keep the existing value" to "use
> a fine-grained value"?

It is a problem, and Linus pointed that out yesterday, which is why I
sent this earlier today:

https://lore.kernel.org/linux-fsdevel/20230919-ctime-v1-1-97b3da92f504@xxxxxxxxxx/T/#u

Bear in mind that we're not dealing with a situation where the value has
not been queried since its last update, so we don't need to use a fine
grained timestamp there (and really, it's preferable not to do so). A
coarse one should be fine in this case.
--
Jeff Layton <jlayton@xxxxxxxxxx>