Re: Loosening time namespace restrictions

From: Andrey Vagin
Date: Mon Sep 25 2023 - 19:40:43 EST


On Fri, Sep 22, 2023 at 5:51 AM Michał Cłapiński <mclapinski@xxxxxxxxxx> wrote:
>
> Hello,
> I faced a problem with the current implementation of time ns while
> using it for container migration. I'd like users of CLOCK_MONOTONIC to
> notice as small of a jump as possible in the clock after migration,
> since according to the documentation "this clock does not count time
> that the system is suspended". In that case the formula for clock
> monotonic offset is "m1_monotonic - m2_monotonic - migration_downtime"
> where m<n>_monotonic is clock monotonic value on the n-th machine.
> Unfortunately due to time ns restrictions, I have to set the offsets
> before putting any process in the namespace. I also can't move
> multithreaded processes between namespaces. So I would have to know
> the migration downtime before the migration is close to done, which
> seems impossible. For that reason I'd like to drop the requirement of
> having to set the offsets before putting any processes in the
> namespace. What do you think? Is it possible to implement this and get
> it merged or should I forgo it? If you think it's possible, I'd
> appreciate any pointers on how to get this done (or how to solve my
> problem in another way).

It was one of the requirements to disallow offset changes if there are
tasks in a target namespace. I didn't remember who made it up, but it
looks reasonable to me even now. The main idea of it is to minimize side
effects and to make the code as simple as possible.

If we want to change this, we need to think about a few things:
* what should we do with timers when offsets are changed?
* synchronization. Right now, when offsets are changed, there are no
readers, so we don't need to use any locks, atomics, etc. The
performance of vdso clock_gettime was one of the major concerns.
In this case, we need to think about it too.
* when offsets are changed, monotonic clocks can jump backward for
processes inside a namespace.
* There may be a few other things that I missed.

Thanks,
Andrei