Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism

From: Sergei Shtepa
Date: Wed Feb 01 2023 - 08:17:09 EST



On 2/1/23 00:58, Mike Snitzer wrote:
> Subject:
> Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism
> From:
> Mike Snitzer <snitzer@xxxxxxxxxx>
> Date:
> 2/1/23, 00:58
>
> To:
> Sergei Shtepa <sergei.shtepa@xxxxxxxxx>
> CC:
> axboe@xxxxxxxxx, corbet@xxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, dm-devel@xxxxxxxxxx
>
>
> On Fri, Dec 09 2022 at 9:23P -0500,
> Sergei Shtepa <sergei.shtepa@xxxxxxxxx> wrote:
>
>> Allows to attach block device filters to the block devices. Kernel
>> modules can use this functionality to extend the capabilities of the
>> block layer.
>>
>> Signed-off-by: Sergei Shtepa <sergei.shtepa@xxxxxxxxx>
>> ---
>> block/bdev.c | 70 ++++++++++++++++++++++++++++++++++++++
>> block/blk-core.c | 19 +++++++++--
>> include/linux/blk_types.h | 2 ++
>> include/linux/blkdev.h | 71 +++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 160 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/bdev.c b/block/bdev.c
>> index d699ecdb3260..b820178824b2 100644
>> --- a/block/bdev.c
>> +++ b/block/bdev.c
>> @@ -427,6 +427,7 @@ static void init_once(void *data)
>>
>> static void bdev_evict_inode(struct inode *inode)
>> {
>> + bdev_filter_detach(I_BDEV(inode));
>> truncate_inode_pages_final(&inode->i_data);
>> invalidate_inode_buffers(inode); /* is it needed here? */
>> clear_inode(inode);
>> @@ -502,6 +503,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>> return NULL;
>> }
>> bdev->bd_disk = disk;
>> + bdev->bd_filter = NULL;
>> return bdev;
>> }
>>
>> @@ -1092,3 +1094,71 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>>
>> blkdev_put_no_open(bdev);
>> }
>> +
>> +/**
>> + * bdev_filter_attach - Attach the filter to the original block device.
>> + * @bdev:
>> + * Block device.
>> + * @flt:
>> + * Filter that needs to be attached to the block device.
>> + *
>> + * Before adding a filter, it is necessary to initialize &struct bdev_filter
>> + * using a bdev_filter_init() function.
>> + *
>> + * The bdev_filter_detach() function allows to detach the filter from the block
>> + * device.
>> + *
>> + * Return: 0 if succeeded, or -EALREADY if the filter already exists.
>> + */
>> +int bdev_filter_attach(struct block_device *bdev,
>> + struct bdev_filter *flt)
>> +{
>> + int ret = 0;
>> +
>> + blk_mq_freeze_queue(bdev->bd_queue);
>> + blk_mq_quiesce_queue(bdev->bd_queue);
>> +
>> + if (bdev->bd_filter)
>> + ret = -EALREADY;
>> + else
>> + bdev->bd_filter = flt;
>> +
>> + blk_mq_unquiesce_queue(bdev->bd_queue);
>> + blk_mq_unfreeze_queue(bdev->bd_queue);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(bdev_filter_attach);
>> +
>> +/**
>> + * bdev_filter_detach - Detach the filter from the block device.
>> + * @bdev:
>> + * Block device.
>> + *
>> + * The filter should be added using the bdev_filter_attach() function.
>> + *
>> + * Return: 0 if succeeded, or -ENOENT if the filter was not found.
>> + */
>> +int bdev_filter_detach(struct block_device *bdev)
>> +{
>> + int ret = 0;
>> + struct bdev_filter *flt = NULL;
>> +
>> + blk_mq_freeze_queue(bdev->bd_queue);
>> + blk_mq_quiesce_queue(bdev->bd_queue);
>> +
>> + flt = bdev->bd_filter;
>> + if (flt)
>> + bdev->bd_filter = NULL;
>> + else
>> + ret = -ENOENT;
>> +
>> + blk_mq_unquiesce_queue(bdev->bd_queue);
>> + blk_mq_unfreeze_queue(bdev->bd_queue);
>> +
>> + if (flt)
>> + bdev_filter_put(flt);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(bdev_filter_detach);
> What about bio-based devices? (DM, MD, etc)
>
> DM uses freeze_bdev() and thaw_bdev(), seems like you're missing some
> work here.

Thanks, Mike.

We are planning to add a freeze_bdev() function call in bdev_filter_attach().
But for the bdev_filter_detach() function, it doesn't seem to make sense.
I think enough to call blk_mq_freeze_queue().

As Fabio already wrote, I use a public repository on github to work with
the patch: https://github.com/SergeiShtepa/linux/commits/blksnap-master
The current state can be viewed there. Feedback is welcome as usual.

>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 5487912befe8..284b295a7b23 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -678,9 +678,24 @@ void submit_bio_noacct_nocheck(struct bio *bio)
>> * to collect a list of requests submited by a ->submit_bio method while
>> * it is active, and then process them after it returned.
>> */
>> - if (current->bio_list)
>> + if (current->bio_list) {
>> bio_list_add(&current->bio_list[0], bio);
>> - else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>> + return;
>> + }
>> +
>> + if (bio->bi_bdev->bd_filter && !bio_flagged(bio, BIO_FILTERED)) {
> Shouldn't this be: if (unlikely(...))?
>
> But that obviously assumes a fair amount about the only consumer
> (temporary filter that lasts as long as it takes to do a backup).

Yes, at the moment the code is being created so that only one filter
is possible. In the summer, I offered a more complex solution, in which
there were altitudes. See:
https://lore.kernel.org/linux-block/1655135593-1900-2-git-send-email-sergei.shtepa@xxxxxxxxx/
But this is redundant code for this task at the moment, since only
one filter is offered now. I think it will be possible to implement
something similar later.

>
>> + bool pass;
>> +
>> + pass = bio->bi_bdev->bd_filter->fops->submit_bio_cb(bio);
>> + bio_set_flag(bio, BIO_FILTERED);
>> + if (!pass) {
>> + bio->bi_status = BLK_STS_OK;
>> + bio_endio(bio);
>> + return;
>> + }
>> + }
>> +
>> + if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>> __submit_bio_noacct_mq(bio);
>> else
>> __submit_bio_noacct(bio);
> And you currently don't allow for blkfilter to be involved if a bio
> recurses (which is how bio splitting works now). Not sure it
> matters, just mentioning it...
>
> But taking a step back, in the hopes of stepping out of your way:
>
> Myself and others on the DM team (past and present) have always hoped
> all block devices could have the flexibility of DM. It was that hope
> that caused my frustration when I first saw your blkfilter approach.
>
> But I was too idealistic that a byproduct of your efforts
> (blk-interposer before and blkfilter now) would usher in _all_ block
> devices being able to comprehensively change their identity (and IO
> processing) like DM enjoys.
>
> DM showcases all the extra code needed to achieve its extreme IO
> remapping and stacking flexibilty -- I don't yet see a way to distill
> the essence of what DM achieves without imposing too much on all block
> core.
>
> So I do think blkfilter is a pragmatic way to achieve your goals.
>
> Mike
>