Re: [GIT PULL v2] timestamp fixes

From: Linus Torvalds
Date: Sat Sep 23 2023 - 16:04:23 EST


On Sat, 23 Sept 2023 at 12:30, Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> It depends on what conversion we need to do. If we're converting to
> userspace's timespec64 data structure, which is denominated in
> nanosecods, it's actually much easier to use decimal 100ns units:

Actually, your data format seems to be a mix of "decimal nanoseconds"
and then a power-of-two seconds (ie bit shift).

Except it looks like ext4 actually does full nanosecond resolution (30
bits for nanoseconds, 34 bits for seconds). Thus the "only a couple of
hundred years of range".

And yes, that's probably close to optimal. It makes it harder to do
*math* on those dates (because you have seconds and 100ns as separate
fields), but for file timestamps that's likely not a real issue.

It was for 'ktime_t()', where with timers etc the whole "subtract and
add times" happens *all* the time, but for file timestamps you
basically have "set time" together with possibly comparing them (and
you can do comparisons without even splitting the fields if you lay
things out reasonably - which you ext4 doesn't seem to have done).

So yeah, I think that would be a fine 'fstime_t' for the kernel.

Except we'd do it without the EXT4_EPOCH_MASK conditionals, and I
think it would be better to have a bigger range for seconds. If you
give the seconds field three extra bits, you're comfortable in the
"thousands of years", and you still have 27 bits that can encode a
decimal "100ns".

It means that when you convert a fstime_t to timespec64 at stat()
time, you'd have to do a 32-bit "multiply by 100" to get the actual
nanosecond field, but that's cheap everywhere (and obviously the
shift-and-masks to get the separate fields out of the 64-bit value).

Linus