Re: [RFC PATCH 5/8] vhost: Add callback that stops new work and waits on running ones

From: Eric W. Biederman
Date: Thu May 18 2023 - 14:38:44 EST


Mike Christie <michael.christie@xxxxxxxxxx> writes:

> On 5/18/23 9:18 AM, Christian Brauner wrote:
>>> @@ -352,12 +353,13 @@ static int vhost_worker(void *data)
>>> if (!node) {
>>> schedule();
>>> /*
>>> - * When we get a SIGKILL our release function will
>>> - * be called. That will stop new IOs from being queued
>>> - * and check for outstanding cmd responses. It will then
>>> - * call vhost_task_stop to exit us.
>>> + * When we get a SIGKILL we kick off a work to
>>> + * run the driver's helper to stop new work and
>>> + * handle completions. When they are done they will
>>> + * call vhost_task_stop to tell us to exit.
>>> */
>>> - vhost_task_get_signal();
>>> + if (vhost_task_get_signal())
>>> + schedule_work(&dev->destroy_worker);
>>> }
>>
>> I'm pretty sure you still need to actually call exit here. Basically
>> mirror what's done in io_worker_exit() minus the io specific bits.
>
> We do call do_exit(). Once destory_worker has flushed the device and
> all outstanding IO has completed it call vhost_task_stop(). vhost_worker()
> above then breaks out of the loop and returns and vhost_task_fn() does
> do_exit().

I am not certain how you want to structure this but you really should
not call get_signal after it returns positive before you call do_exit.

You are in complete uncharted and untested waters calling get_signal
multiple times, when get_signal figures the proper response is to
call do_exit itself.

Eric