Re: [Y2038][time namespaces] Question regarding CLOCK_REALTIME support plans in Linux time namespaces

From: Thomas Gleixner
Date: Fri Oct 30 2020 - 16:06:10 EST


On Fri, Oct 30 2020 at 12:58, Carlos O'Donell wrote:
> On 10/30/20 11:10 AM, Thomas Gleixner via Libc-alpha wrote:
>> That's what virtual machines are for.
>
> Certainly, that is always an option, just like real hardware.
>
> However, every requirement we add to testing reduces the number of
> times that developer will run the test on their system and potentially
> catch a problem during development. Yes, CI helps, but "make check"
> gives more coverage. More kernel variants tested in all downstream rpm
> %check builds or developer systems. Just like kernel self tests help
> today.
>
> glibc uses namespaces in "make check" to increase the number of userspace
> and kernel features we can test immediately and easily on developer
> *or* distribution build systems.
>
> So the natural extension is to further isolate the testing namespace
> using the time namespace to test and verify y2038. If we can't use
> namespaces then we'll have to move the tests out to the less
> frequently run scripts we use for cross-target toolchain testing,
> and so we'll see a 100x drop in coverage.

I understand that.

> I expect that more requests for further time isolation will happen
> given the utility of this in containers.

There was a lengthy discussion about this and the only "usecase" which
was brought up was having different NTP servers in name spaces, i.e. the
leap second ones and the smearing ones.

Now imagine 1000 containers each running their own NTP. Guess what the
host does in each timer interrupt? Chasing 1000 containers and update
their notion of CLOCK_REALTIME. In the remaining 5% CPU time the 1000
containers can do their computations.

But even if you restrict it to a trivial offset without NTP
capabilities, what's the semantics of that offset when the host time is
set?

- Does the offset just stay the same and container time just jumps
around with the host time?

- Has it to change so that the containers notion of realtime is not
affected? Which is pretty much equivalent to the NTP case of chasing
a gazillion of containers, just it might give the containers a bit
more than 5% remaining CPU time.

- Can the offset of the container be changed at runtime,
i.e. is clock_settime() possible from withing the container?

There are some other bits related to that as well, but the above is
already mindboggling.

> If we have to use qemu today then that's where we're at, but again
> I expect our use case is representative of more than just glibc.

For testing purposes it might be. For real world use cases not so
much. People tend to rely on the coordinated nature of CLOCK_TAI and
CLOCK_REALTIME.

> Does checkpointing work robustly when userspace APIS use
> CLOCK_REALTIME (directly or indirectly) in the container?

AFAICT, yes. That was the conclusion over the lenghty discussion about
time name spaces and their requirements.

Here is the Linux plumber session related to that:

https://www.youtube.com/watch?v=sjRUiqJVzOA

Thanks,

tglx