Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work

From: Jens Axboe
Date: Tue Aug 10 2021 - 22:51:45 EST


On 8/10/21 8:33 PM, Nadav Amit wrote:
>
>
>> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>>
>> On 8/10/21 9:28 AM, Nadav Amit wrote:
>>>
>>> Unfortunately, there seems to be yet another issue (unless my code
>>> somehow caused it). It seems that when SQPOLL is used, there are cases
>>> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
>>> never goes down to zero.
>>>
>>> Debugging... (while also trying to make some progress with my code)
>>
>> It's most likely because a request has been lost (mis-refcounted).
>> Let us know if you need any help. Would be great to solve it for 5.14.
>> quick tips:
>>
>> 1) if not already, try out Jens' 5.14 branch
>> git://git.kernel.dk/linux-block io_uring-5.14
>>
>> 2) try to characterise the io_uring use pattern. Poll requests?
>> Read/write requests? Send/recv? Filesystem vs bdev vs sockets?
>>
>> If easily reproducible, you can match io_alloc_req() with it
>> getting into io_dismantle_req();
>
> So actually the problem is more of a missing IO-uring functionality
> that I need. When an I/O is queued for async completion (i.e., after
> returning -EIOCBQUEUED), there should be a way for io-uring to cancel
> these I/Os if needed.

There's no way to cancel file/bdev related IO, and there likely never
will be. That's basically the only exception, everything else can get
canceled pretty easily. Many things can be written on why that is the
case, and they have (myself included), but it boils down to proper
hardware support which we'll likely never have as it's not a well tested
path. For other kind of async IO, we're either waiting in poll (which is
trivially cancellable) or in an async thread (which is also easily
cancellable). For DMA+irq driven block storage, we'd need to be able to
reliably cancel on the hardware side, to prevent errant DMA after the
fact.

None of this is really io_uring specific, io_uring just suffers from the
same limitations as others would (or are).

> Otherwise they might potentially never complete, as happens in my
> use-case.

If you see that, that is most certainly a bug. While bdev/reg file IO
can't really be canceled, they all have the property that they complete
in finite time. Either the IO completes normally in a "short" amount of
time, or a timeout will cancel it and complete it in error. There are no
unbounded execution times for uncancellable IO.

> AIO has ki_cancel() for this matter. So I presume the proper solution
> would be to move ki_cancel() from aio_kiocb to kiocb so it can be used
> by both io-uring and aio. And then - to use this infrastructure.

There is no infrastructure, I'm fraid. ki_cancel() is just a random hook
that nobody (outside of USB gadget??) ever implemented or used.

> But it is messy. There is already a bug in the (few) uses of
> kiocb_set_cancel_fn() that blindly assume AIO is used and not
> IO-uring. Then, I am not sure about some things in the AIO code. Oh
> boy. I’ll work on an RFC.

ki_cancel is a non-starter, it doesn't even work for the single case
that it's intended for, and I'm actually surprised it hasn't been
removed yet. It's one of those things that someone added a hook for, but
never really grew into something that is useful.

--
Jens Axboe