Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

From: Oleg Nesterov
Date: Mon Jun 05 2023 - 08:40:23 EST


On 06/02, Eric W. Biederman wrote:
>
> static int vhost_task_fn(void *data)
> {
> struct vhost_task *vtsk = data;
> - int ret;
> + bool dead = false;
> +
> + for (;;) {
> + bool did_work;
> +
> + if (!dead && signal_pending(current)) {
> + struct ksignal ksig;
> + /*
> + * Calling get_signal can block in SIGSTOP,
> + * and the freezer. Or it can clear
> + * fatal_signal_pending and return non-zero.
> + */
> + dead = get_signal(&ksig);
> + if (dead)
> + set_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags);
> + }
> +
> + /* mb paired w/ kthread_stop */
> + set_current_state(TASK_INTERRUPTIBLE);
> +
> + did_work = vtsk->fn(vtsk->data);

I don't understand why do you set TASK_INTERRUPTIBLE before vtsk->fn(),
it seems that you could do this before the test_bit(FLAGS_STOP) below.
But probably I missed something and this is minor anyway...

> + if (!did_work) {
> + if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
> + __set_current_state(TASK_RUNNING);
> + break;

What if VHOST_TASK_FLAGS_STOP was set by us after get_signal() above ?
We need to ensure that in this case vhost_work_queue() can't add a new work,
nobody will flush it.

In fact, unless I missed something this can even race with vhost_dev_flush().

vhost_dev_flush: vhost_task_fn:

checks FLAGS_STOP, not set,
vhost_task_flush() returns false
gets SIGKILL, sets FLAGS_STOP

vtsk->fn() returns false

vhost_task_fn() exits.

vhost_work_queue();
wait_for_completion(&flush.wait_event);


and the last wait_for_completion() will hang forever.

Oleg.