Re: [Y2038][time namespaces] Question regarding CLOCK_REALTIME support plans in Linux time namespaces

From: Carlos O'Donell
Date: Wed Nov 25 2020 - 15:38:19 EST


On 11/19/20 7:14 PM, Thomas Gleixner wrote:
> I hope you are aware that the time namespace offsets have to be set
> _before_ the process starts and can't be changed afterwards,
> i.e. settime() is not an option.

I not interested in settime(). I saw Petr's request and forwarded it on
here to further the educational conversation about CLOCK_REALTIME and
cement a consensus around this issue. I'm happy to evangelize that we
won't support settime() for the specific reasons you call out and that
way I can give architectural guidance to setup systems in a particular
way to use CRIU or VM+container if needed.

> That might limit the usability for your use case and this can't be
> changed at all because there might be armed timers and other time
> related things which would start to go into full confusion mode.

The use of time in these cases, from first principles, seems to
degenerate into:

* Verify a time-dependent action is correct.

In glibc's case:

* Verify various APIs after y2038.

In Petr's case he could split the test into two such tests
but he would have to reproduce the system state for the second
test and I expect he wants to avoid that.

In that case I think Petr has to use CRIU to start the container,
stop it, advance time, and restart. That should work perfectly
in that use case and solve all the problems by relying on the work
done by CRIU.

> The supported use case is container life migration and that _is_ very
> careful about restoring time and armed timers and if their user space
> tools screw it up then they can keep the bits and pieces.

Agreed.

> So in order to utilize that you'd have to checkpoint the container,
> manipulate the offsets and restore it.

Or use the same mechanisms CRIU uses.

I can't rely on CRIU because I have to bootstrap a toolchain and userspace.
We should be able to write a thin veneer into our own testing wrapper and
emulate whatever CRIU does. We know apriori that our test framework starts
the test without anything having been executed yet. So we have that benefit.
Currently we unshare() for NEWUSER/NEWPID/NEWNS, but I expect that will
get a little more advanced. Already having these namespaces helps immensely
when adding fs-related tests.

> Aside of this, there are other things, e.g. file times, packet
> timestamps etc. which are based on CLOCK_REALTIME. What to do about
> them? Translate these to/from name space time or not? There is a long
> list of other horrors which are related to that.

We haven't even started testing for the upcoming negative leap second ;-)

> So _you_ might say, that you don't care about file times, RTC, timers
> expiring at the wrong time, packet timestamps and whatever.

I do care about them, but only given certain contexts.

> But then the next test dude comes around and want's to test exactly
> these interfaces and we have to slap the time namespace conversions for
> REALTIME and TAI all over the place because we already support the
> minimal thing.

That's a decision you need to make when asked those questions.

> Can you see why this is a slippery slope and why I'm extremly reluctant
> to even provide the minimal 'distort realtime when the namespace starts'
> support?

I would argue this is a slippery slope fallacy. If and when we get better
vm+container support we just tear all this code out and tell people to
start using those frameworks. The vm+container frameworks have independent
reasons to exist and so will continue to improve for security isolation
purposes and end up solving time testing issues by allowing us complete
control over the VMs time.

>> Hopefully this ilustrates that real time name space is not "request for
>> ponny" :-)
>
> I can understand your pain and why you want to distort time, but please
> understand that timekeeping is complex. The primary focus must be
> correctness, scalability and maintainability which is already hard
> enough to achieve. Just for the perspective: It took us only 8 years to
> get the kernel halfways 2038 ready (filesystems still outstanding).

I agree. The upstream glibc community has been working on y2038 since 2018;
not as long as the kernel.

> So from my point of view asking for distorted time still _is_ a request
> for ponies.

I'm happy if you say it's more work than the value it provides.

> The fixed offsets for clock MONOTONIC/BOOTTIME are straight forward,
> absolutely make sense and they have a limited scope of exposure. clock
> REALTIME/TAI are very different beasts which entail a slew of horrors.
> Adding settime() to the mix makes it exponentially harder.

Right.

--
Cheers,
Carlos.