Re: [RFD] posix-timers: CRIU woes

From: Thomas Gleixner
Date: Wed May 10 2023 - 04:31:16 EST


Pavel!

On Wed, May 10 2023 at 12:36, Pavel Tikhomirov wrote:
> On 10.05.2023 05:42, Thomas Gleixner wrote:
>> So because of that half thought out user space ABI we are now up the
>> regression creek without a paddle, unless CRIU can accomodate to a
>> different restore mechanism to lift this restriction from the kernel.
>>
>> Thoughts?
>
> Maybe we can do something similar to /proc/sys/kernel/ns_last_pid?
> Switch to per-(process->signal) idr based approach with idr_set_cursor
> to set next id for next posix timer from new sysctl?

I'm not a fan of such sysctls. We have already too many of them and that
particular one does not buy much.

We can simply let timer_create() or a new syscall create a timer at a
given ID.

That allows CRIU to restore any checkpointed process no matter which
kernel version it came from without doing this insane create/delete
dance.

The downside is that this allows to create stupidly sparse timer IDs
even for the non CRIU case, which increases per process kernel memory
consumption and creates slightly more overhead in the signal delivery
path. The latter is a burden on the process owning the timer and not
affecting expiry, which is a context stealing operation. The memory part
needs eventually some thoughts vs. accounting.

If the 'explicit at ID' option is not used then the ID mechanism is
optimzied for dense IDs by using the first available ID in a bottom up
search, which recovers holes created by a timer_delete() operation.

Thanks,

tglx