Re: [RFD] posix-timers: CRIU woes

From: Pavel Tikhomirov
Date: Thu May 11 2023 - 00:13:04 EST




On 10.05.2023 16:30, Thomas Gleixner wrote:
Pavel!

On Wed, May 10 2023 at 12:36, Pavel Tikhomirov wrote:
On 10.05.2023 05:42, Thomas Gleixner wrote:
So because of that half thought out user space ABI we are now up the
regression creek without a paddle, unless CRIU can accomodate to a
different restore mechanism to lift this restriction from the kernel.

Thoughts?

Maybe we can do something similar to /proc/sys/kernel/ns_last_pid?
Switch to per-(process->signal) idr based approach with idr_set_cursor
to set next id for next posix timer from new sysctl?

I'm not a fan of such sysctls. We have already too many of them and that
particular one does not buy much.

Sorry, it was a bad idea, what you suggest below is much better.


We can simply let timer_create() or a new syscall create a timer at a
given ID.

Yes this would work for CRIU. (note: in neighbor thread Andrei writes about adding a bit to sigevent.sigev_notify to request a timer with a specified id, new syscall is also a good option)


That allows CRIU to restore any checkpointed process no matter which
kernel version it came from without doing this insane create/delete
dance.

Yes, for CRIU this kind of API change is a big improvement.


The downside is that this allows to create stupidly sparse timer IDs
even for the non CRIU case, which increases per process kernel memory
consumption and creates slightly more overhead in the signal delivery
path. The latter is a burden on the process owning the timer and not
affecting expiry, which is a context stealing operation. The memory part
needs eventually some thoughts vs. accounting.

If the 'explicit at ID' option is not used then the ID mechanism is
optimzied for dense IDs by using the first available ID in a bottom up
search, which recovers holes created by a timer_delete() operation.

Not sure how kernel memory consumption increases with sparse timer IDs, global hashtable (posix_timers_hashtable) is the same size anyway, entries in hlists can be distributed differently as hash depends on id directly but we have same number of entries. Probably I miss something, why do we need dense IDs?


Thanks,

tglx

--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.