Re: [PATCH v4 3/4] ublk: enable zoned storage support

From: Andreas Hindborg (Samsung)
Date: Fri Jun 30 2023 - 09:28:32 EST



Damien Le Moal <dlemoal@xxxxxxxxxx> writes:

> On 6/29/23 16:50, Andreas Hindborg (Samsung) wrote:
>>>> UCLINUX (M68KNOMMU AND COLDFIRE)
>>>> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
>>>> index 5b9d4aaebb81..c58dfd035557 100644
>>>> --- a/drivers/block/Kconfig
>>>> +++ b/drivers/block/Kconfig
>>>> @@ -402,6 +402,10 @@ config BLKDEV_UBLK_LEGACY_OPCODES
>>>> suggested to enable N if your application(ublk server) switches to
>>>> ioctl command encoding.
>>>>
>>>> +config BLK_DEV_UBLK_ZONED
>>>> + def_bool y
>>>
>>> This can be "bool" only.
>>
>> I did it like this because I wanted BLK_DEV_UBLK_ZONED to be set
>> automatically to "y" when BLK_DEV_UBLK and BLK_DEV_ZONED are enabled.
>> BLK_DEV_UBLK_ZONED has no menu entry. If we change it to "bool" it will
>> be default "n" and we would have to make it appear in menuconfig to be
>> manually enabled.
>>
>> Isn't it more sane if it is just unconditionally enabled when
>> BLK_DEV_ZONED and BLK_DEV_UBLK is enabled?
>
> Maybe something like this then:
>
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index 5b9d4aaebb81..105e1121bf5f 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -370,9 +370,13 @@ config BLK_DEV_RBD
>
> If unsure, say N.
>
> +config BLK_DEV_UBLK_ZONED
> + bool
> +
> config BLK_DEV_UBLK
> tristate "Userspace block driver (Experimental)"
> select IO_URING
> + select BLK_DEV_UBLK_ZONED if BLK_DEV_ZONED
> help
> io_uring based userspace block driver. Together with ublk server, ublk
> has been working well, but interface with userspace or command data

OK 👍

>>>> + ub->ub_disk->nr_zones = p->dev_sectors / p->chunk_sectors;
>>>> +}
>>>> +
>>>> +void ublk_dev_param_zoned_apply(struct ublk_device *ub)
>>>> +{
>>>> + const struct ublk_param_zoned *p = &ub->params.zoned;
>>>> +
>>>> + if (ub->dev_info.flags & UBLK_F_ZONED) {
>>>> + disk_set_max_active_zones(ub->ub_disk, p->max_active_zones);
>>>> + disk_set_max_open_zones(ub->ub_disk, p->max_open_zones);
>>>
>>> You do not need to check if the max_active_zones and max_open_zones values are
>>> sensible ? E.g. what if they are larger than the number of zones ?
>>
>> I don't think it will be a problem. I assume you can set them to 0 to
>> indicate infinite. If user space sets them to more than the actual
>> number of available zones (that user space set up also), user space will
>> have to handle that when it gets a zone request for a zone outside the
>> logical drive LBA space. I don't think the kernel has to care. Will this
>> break anything kernel side?
>
> zonefs and btrfs look at these limits. So I would rather prefer seeing sensible
> values. 0 == no limit is fine. But seeing maz or moz > nr_zones for the device
> is just too strange and I would prefer an error in that case to let the user
> know it is not doing something right. Getting moz > maz is also nonsense that
> needs to be checked. There is one case we could silently change: if maz ==
> nr_zones or moz == nr_zones, then that essentially means "no limit", so we could
> use 0 to let the in-kernel users that they do not need to care about the limits.

OK, will check.

>
>>
>>>
>>>> + }
>>>> +}
>>>> +
>>>> +int ublk_revalidate_disk_zones(struct gendisk *disk)
>>>> +{
>>>> + return blk_revalidate_disk_zones(disk, NULL);
>>>> +}
>>>
>>> I do not think this helper is needed at all (see below comment on the call site).
>>
>> This is required because the prototype for `blk_revalidate_disk_zones()`
>> is not defined when BLK_DEV_ZONED is not defined. Without this helper
>> for which the prototype is always defined, we will have a compile error
>> at the call site.
>
> You have this hunk in ublk_ctrl_start_dev:
>
> disk_set_zoned(disk, BLK_ZONED_HM);
> blk_queue_required_elevator_features(disk->queue, ELEVATOR_F_ZBD_SEQ_WRITE);
> ret = ublk_revalidate_disk_zones(disk);
> if (ret)
> goto out_put_disk;
>
> And that is called under "if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)". So I do not see
> how you can get compile errors using directly blk_revalidate_disk_zones(). The
> entire hunk above should be a helper function I think.

The prototype for `disk_set_zoned()` is defined independent of
CONFIG_BLK_DEV_ZONED. The prototype for `blk_revalidate_disk_zones()` is
only defined when CONFIG_BLK_DEV_ZONED is enabled. It is not the same.
We can't have a call to `blk_revalidate_disk_zones()` in a translation
unit when the prototype is not defined, even behind a `if
(IS_ENABLED())` guard. It would be fine behind an `#ifdef` though.

If we move the entire hunk to a helper for which the prototype is always
defined, we should be able to call `blk_revalidate_disk_zones()`
directly 👍 I'll see if I can do that.

>>>> +
>>>> + if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>>>> + ublk_set_nr_zones(ub);
>>>
>>> So if the user is attempting to setup a zoned drive but the kernel does not have
>>> CONFIG_BLK_DEV_ZONED=y, the user setup will be silently ignored. Not exactly
>>> nice I think, unless I am missing something.
>>
>> We have this in `ublk_ctrl_add_dev()`:
>>
>> if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> ub->dev_info.flags &= ~UBLK_F_ZONED;
>>
>> User space is supposed to check the flags after the call to know the
>> kernel capabilities.
>
> And you trust user space to be always correct ? I never do :)
> So definitely, check and error if not supported. No silently clearing the device
> zoned flag.
>

I'll make it fail instead of just clearing the flag 👍

>
>>> Also, repeating that "if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))" for all zone
>>> related functions is very verbose. Stub the functions in ublk_drv.h. That will
>>> make the main C code lighter.
>>
>> Not sure what you mean "stub the functions". Like so:
>>
>> @@ -216,8 +216,7 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
>>
>> set_capacity(ub->ub_disk, p->dev_sectors);
>>
>> - if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> - ublk_set_nr_zones(ub);
>> + ublk_config_nr_zones(ub);
>> }
>>
>> And then in the header:
>>
>> void ublk_config_nr_zones(struct ublk_device *ub)
>> {
>> if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> ublk_set_nr_zones(ub);
>> }
>>
>> Or something else?
>
> #ifndef CONFIG_BLK_DEV_ZONED
> static inline void ublk_set_nr_zones() {}
> #endif
>
> That avoid repeating that if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) all over the
> place. The stubbed functions can be called only for "if (ub->dev_info.flags &
> UBLK_F_ZONED", which is OK since this flag is never supposed to be accepted and
> set for the CONFIG_BLK_DEV_ZONED=n case.

Alright 👍

>
>>>> + disk_set_zoned(disk, BLK_ZONED_HM);
>>>> + blk_queue_required_elevator_features(disk->queue, ELEVATOR_F_ZBD_SEQ_WRITE);
>>>> + ret = ublk_revalidate_disk_zones(disk);
>>>> + if (ret)
>>>> + goto out_put_disk;
>>>
>>> This should be all a helper ublk_set_zoned() or something.
>>
>> Ok 👍 Unfortunately this block of code is split up in the zone append
>> patch. I will see what I can do, maybe 2 functions.
>
> Beware that I just posted patches that require zone size (chunk_sectors limit)
> and max append sectors limit to be set *before* calling
> blk_revalidate_disk_zones(). Otherwise, you will get an error.

Got it, thanks.

>
>>>> get_device(&ub->cdev_dev);
>>>> ub->dev_info.state = UBLK_S_DEV_LIVE;
>>>> ret = add_disk(disk);
>>>> @@ -1997,6 +2046,9 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>>>> if (ub->dev_info.flags & UBLK_F_USER_COPY)
>>>> ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA;
>>>>
>>>> + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>>>> + ub->dev_info.flags &= ~UBLK_F_ZONED;
>>>
>>> Arg, no. The user should be notified with an error that he/she is attempting to
>>> create a zoned device that cannot be supported.
>>
>> As I wrote above, this is part of the ublk uapi. This is the way user
>> space does kernel capability check. User space will check the flags when
>> this call returns. @Ming did I understand this correctly or did I miss something?
>
> OK. So this is like a virtio capability flags thing ? Negotiation between device
> and driver ?

I'm not familiar with that one, but I think you are right we should fail
the request here. I makes no sense to configure a regular ublk device if
user space wants to set up a zoned one. Failing is more sensible.

>
>>>> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
>>>> index 471b3b983045..436525afffe8 100644
>>>> --- a/include/uapi/linux/ublk_cmd.h
>>>> +++ b/include/uapi/linux/ublk_cmd.h
>>>> @@ -176,6 +176,11 @@
>>>> /* Copy between request and user buffer by pread()/pwrite() */
>>>> #define UBLK_F_USER_COPY (1UL << 7)
>>>>
>>>> +/*
>>>> + * Enable zoned device support
>>>
>>> Isn't this for "Indicate that the device is zoned" ?
>>
>> User space sets this flag when setting up the device to enable zoned
>> storage support. Kernel may reject it. I think the wording is
>> sufficient. Could be updated to:
>>
>> "User space sets this flag when setting up the device to request zoned
>> storage support. Kernel may deny the request by clearing the flag
>> during setup."
>
> I really have a big problem with this. If the user driver does not check that
> the flag was cleared, the user will assume that the device is zoned, but it is
> not. This is not super nice... I would really prefer just failing the devie
> creation if zoned was requested but cannot be satisfied, either because the
> kernel does not support zoned block devices or when the user specified something
> nonsensical.

Changed to:

/*
* User space sets this flag when setting up the device to request zoned storage support. Kernel may
* deny the request by returning an error.
*/
#define UBLK_F_ZONED (1ULL << 8)

Best regards,
Andreas