Re: [RFD] posix-timers: CRIU woes

From: Thomas Gleixner
Date: Thu May 11 2023 - 09:42:48 EST


On Thu, May 11 2023 at 17:52, Pavel Tikhomirov wrote:
> On 11.05.2023 17:36, Thomas Gleixner wrote:
>> On Thu, May 11 2023 at 11:17, Pavel Tikhomirov wrote:
>>> On 10.05.2023 16:16, Andrey Vagin wrote:
>>>>>
>>>>> So because of that half thought out user space ABI we are now up the
>>>>> regression creek without a paddle, unless CRIU can accomodate to a
>>>>> different restore mechanism to lift this restriction from the kernel.
>>>>>
>>>> If you give us a new API to create timers with specified id-s, we will
>>>> figure out how to live with it. It isn't good to ask users to update
>>>> CRIU to work on new kernels, but here are reasons and event improvements
>>>> for CRIU, so I think it's worth it.
>>>
>>> I agree, any API to create timers with specified id-s would work for new
>>> CRIU versions.
>>
>> The real question is whether this will cause any upheaval when a new
>> kernel meets a non-updated CRIU stack.
>
> Creation of posix timer would hang forever in this loop
> https://github.com/checkpoint-restore/criu/blob/33dd66c6fc93c47213aaa0447a94d97ba1fa56ba/criu/pie/restorer.c#L1185
> if old criu is run on new kernel (without consecutive id allocation) AFAICS.

Yes, because that "sanity" check

if ((long)next_id > args->posix_timers[i].spt.it_id)

which tries to establish whether the kernel provides timer IDs in strict
increasing order does not work for that case.

It "works" to detect the IDR case on older kernels by chance, but not
under all circumstances. Assume the following case:

Global IDR has a free slot at index 1

Restore tries to create a timer for index 2

That will also loop forever, unless some other process creates a timer
and occupies the free slot at index 1, right?

So this needs a fix anyway, which should be done so that the new kernel
case is at least properly detected.

But even then there is still the problem of "it worked before I upgraded
the kernel".

IOW, we are still up a creek without a paddle, unless you would be
willing to utilize the existing CRIU bug to distribute the 'deal with
new kernel' mechanics as a bug bounty :)

Fix for the loop termination below.

Thanks,

tglx
---
criu/pie/restorer.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

--- a/criu/pie/restorer.c
+++ b/criu/pie/restorer.c
@@ -1169,10 +1169,10 @@ static int timerfd_arm(struct task_resto
static int create_posix_timers(struct task_restore_args *args)
{
int ret, i;
- kernel_timer_t next_id;
+ kernel_timer_t next_id, timer_id;
struct sigevent sev;

- for (i = 0; i < args->posix_timers_n; i++) {
+ for (i = 0, next_id = 0; i < args->posix_timers_n; i++) {
sev.sigev_notify = args->posix_timers[i].spt.it_sigev_notify;
sev.sigev_signo = args->posix_timers[i].spt.si_signo;
#ifdef __GLIBC__
@@ -1183,25 +1183,27 @@ static int create_posix_timers(struct ta
sev.sigev_value.sival_ptr = args->posix_timers[i].spt.sival_ptr;

while (1) {
- ret = sys_timer_create(args->posix_timers[i].spt.clock_id, &sev, &next_id);
+ ret = sys_timer_create(args->posix_timers[i].spt.clock_id, &sev, &timer_id);
if (ret < 0) {
pr_err("Can't create posix timer - %d\n", i);
return ret;
}

- if (next_id == args->posix_timers[i].spt.it_id)
+ if (timer_id != next_id) {
+ pr_err("Can't create timers, kernel don't give them consequently\n");
+ return -1;
+ }
+
+ next_id++;
+
+ if (timer_id == args->posix_timers[i].spt.it_id)
break;

- ret = sys_timer_delete(next_id);
+ ret = sys_timer_delete(timer_id);
if (ret < 0) {
- pr_err("Can't remove temporaty posix timer 0x%x\n", next_id);
+ pr_err("Can't remove temporaty posix timer 0x%x\n", timer_id);
return ret;
}
-
- if ((long)next_id > args->posix_timers[i].spt.it_id) {
- pr_err("Can't create timers, kernel don't give them consequently\n");
- return -1;
- }
}
}