Re: [PATCH v3 4/4] io_uring: add support for zone-append

From: Jens Axboe
Date: Sun Jul 05 2020 - 17:12:56 EST


On 7/5/20 3:09 PM, Matthew Wilcox wrote:
> On Sun, Jul 05, 2020 at 03:00:47PM -0600, Jens Axboe wrote:
>> On 7/5/20 12:47 PM, Kanchan Joshi wrote:
>>> From: Selvakumar S <selvakuma.s1@xxxxxxxxxxx>
>>>
>>> For zone-append, block-layer will return zone-relative offset via ret2
>>> of ki_complete interface. Make changes to collect it, and send to
>>> user-space using cqe->flags.
>>>
>>> Signed-off-by: Selvakumar S <selvakuma.s1@xxxxxxxxxxx>
>>> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx>
>>> Signed-off-by: Javier Gonzalez <javier.gonz@xxxxxxxxxxx>
>>> ---
>>> fs/io_uring.c | 21 +++++++++++++++++++--
>>> 1 file changed, 19 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 155f3d8..cbde4df 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -402,6 +402,8 @@ struct io_rw {
>>> struct kiocb kiocb;
>>> u64 addr;
>>> u64 len;
>>> + /* zone-relative offset for append, in sectors */
>>> + u32 append_offset;
>>> };
>>
>> I don't like this very much at all. As it stands, the first cacheline
>> of io_kiocb is set aside for request-private data. io_rw is already
>> exactly 64 bytes, which means that you're now growing io_rw beyond
>> a cacheline and increasing the size of io_kiocb as a whole.
>>
>> Maybe you can reuse io_rw->len for this, as that is only used on the
>> submission side of things.
>
> I'm surprised you aren't more upset by the abuse of cqe->flags for the
> address.

Yeah, it's not great either, but we have less leeway there in terms of
how much space is available to pass back extra data.

> What do you think to my idea of interpreting the user_data as being a
> pointer to somewhere to store the address? Obviously other things
> can be stored after the address in the user_data.

I don't like that at all, as all other commands just pass user_data
through. This means the application would have to treat this very
differently, and potentially not have a way to store any data for
locating the original command on the user side.

> Or we could have a separate flag to indicate that is how to interpret
> the user_data.

I'd be vehemently against changing user_data in any shape or form.
It's to be passed through from sqe to cqe, that's how the command flow
works. It's never kernel generated, and it's also used as a key for
command lookup.

--
Jens Axboe