Re: [PATCH v5 02/11] block: Block Device Filtering Mechanism

From: Sergei Shtepa
Date: Wed Jul 19 2023 - 04:37:09 EST




On 7/19/23 09:28, Yu Kuai wrote:
> Subject:
> Re: [PATCH v5 02/11] block: Block Device Filtering Mechanism
> From:
> Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>
> Date:
> 7/19/23, 09:28
>
> To:
> Sergei Shtepa <sergei.shtepa@xxxxxxxxx>, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>, axboe@xxxxxxxxx, hch@xxxxxxxxxxxxx, corbet@xxxxxxx, snitzer@xxxxxxxxxx
> CC:
> viro@xxxxxxxxxxxxxxxxxx, brauner@xxxxxxxxxx, dchinner@xxxxxxxxxx, willy@xxxxxxxxxxxxx, dlemoal@xxxxxxxxxx, linux@xxxxxxxxxxxxxx, jack@xxxxxxx, ming.lei@xxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, Donald Buczek <buczek@xxxxxxxxxxxxx>, "yukuai (C)" <yukuai3@xxxxxxxxxx>
>
>
> Hi,
>
> 在 2023/07/19 0:33, Sergei Shtepa 写道:
>>
>>
>> On 7/18/23 14:32, Yu Kuai wrote:
>>> Subject:
>>> Re: [PATCH v5 02/11] block: Block Device Filtering Mechanism
>>> From:
>>> Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>
>>> Date:
>>> 7/18/23, 14:32
>>>
>>> To:
>>> Sergei Shtepa <sergei.shtepa@xxxxxxxxx>, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>, axboe@xxxxxxxxx, hch@xxxxxxxxxxxxx, corbet@xxxxxxx, snitzer@xxxxxxxxxx
>>> CC:
>>> viro@xxxxxxxxxxxxxxxxxx, brauner@xxxxxxxxxx, dchinner@xxxxxxxxxx, willy@xxxxxxxxxxxxx, dlemoal@xxxxxxxxxx, linux@xxxxxxxxxxxxxx, jack@xxxxxxx, ming.lei@xxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, Donald Buczek <buczek@xxxxxxxxxxxxx>, "yukuai (C)" <yukuai3@xxxxxxxxxx>
>>>
>>>
>>> Hi,
>>>
>>> 在 2023/07/18 19:25, Sergei Shtepa 写道:
>>>> Hi.
>>>>
>>>> On 7/18/23 03:37, Yu Kuai wrote:
>>>>> Subject:
>>>>> Re: [PATCH v5 02/11] block: Block Device Filtering Mechanism
>>>>> From:
>>>>> Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>
>>>>> Date:
>>>>> 7/18/23, 03:37
>>>>>
>>>>> To:
>>>>> Sergei Shtepa <sergei.shtepa@xxxxxxxxx>, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>, axboe@xxxxxxxxx, hch@xxxxxxxxxxxxx, corbet@xxxxxxx, snitzer@xxxxxxxxxx
>>>>> CC:
>>>>> viro@xxxxxxxxxxxxxxxxxx, brauner@xxxxxxxxxx, dchinner@xxxxxxxxxx, willy@xxxxxxxxxxxxx, dlemoal@xxxxxxxxxx, linux@xxxxxxxxxxxxxx, jack@xxxxxxx, ming.lei@xxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, Donald Buczek <buczek@xxxxxxxxxxxxx>, "yukuai (C)" <yukuai3@xxxxxxxxxx>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> 在 2023/07/17 22:39, Sergei Shtepa 写道:
>>>>>>
>>>>>>
>>>>>> On 7/11/23 04:02, Yu Kuai wrote:
>>>>>>> bdev_disk_changed() is not handled, where delete_partition() and
>>>>>>> add_partition() will be called, this means blkfilter for partiton will
>>>>>>> be removed after partition rescan. Am I missing something?
>>>>>>
>>>>>> Yes, when the bdev_disk_changed() is called, all disk block devices
>>>>>> are deleted and new ones are re-created. Therefore, the information
>>>>>> about the attached filters will be lost. This is equivalent to
>>>>>> removing the disk and adding it back.
>>>>>>
>>>>>> For the blksnap module, partition rescan will mean the loss of the
>>>>>> change trackers data. If a snapshot was created, then such
>>>>>> a partition rescan will cause the snapshot to be corrupted.
>>>>>>
>>>>>
>>>>> I haven't review blksnap code yet, but this sounds like a problem.
>>>>
>>>> I can't imagine a case where this could be a problem.
>>>> Partition rescan is possible only if the file system has not been
>>>> mounted on any of the disk partitions. Ioctl BLKRRPART will return
>>>> -EBUSY. Therefore, during normal operation of the system, rescan is
>>>> not performed.
>>>> And if the file systems have not been mounted, it is possible that
>>>> the disk partition structure has changed or the disk in the media
>>>> device has changed. In this case, it is better to detach the
>>>> filter, otherwise it may lead to incorrect operation of the module.
>>>>
>>>> We can add prechange/postchange callback functions so that the
>>>> filter can track rescan process. But at the moment, this is not
>>>> necessary for the blksnap module.
>>>
>>> So you mean that blkfilter is only used for the case that partition
>>> is mounted? (Or you mean that partition is opened)
>>>
>>> Then, I think you mean that filter should only be used for the partition
>>> that is opended? Otherwise, filter can be gone at any time since
>>> partition rescan can be gone.
>>>
>>> //user
>>> 1. attach filter
>>>          // other context rescan partition
>>> 2. mount fs
>>> // user will found filter is gone.
>>
>> Mmm...  The fact is that at the moment the user of the filter is the
>> blksnap module. There are no other filter users yet. The blksnap module
>> solves the problem of creating snapshots, primarily for backup purposes.
>> Therefore, the main use case is to attach a filter for an already running
>> system, where all partitions are marked up, file systems are mounted.
>>
>> If the server is being serviced, during which the disk is being
>> re-partitioned, then disabling the filter is normal. In this case, the
>> change tracker will be reset, and at the next backup, the filter will be
>> attached again.
>
> Thanks for the explanation, I was thinking that blkshap can replace
> dm-snapshot.

Thanks!
At the moment I am creating blksnap with the Veeam product needs in mind.
I would be glad if blksnap would be useful in other products as well.
If you have any thoughts/questions/suggestions/comments, then write to me
directly. I'll be happy to discuss everything.
To work on the patch, I use the branch here
Link: https://github.com/SergeiShtepa/linux/tree/blksnap-master
The user-space libs, tools and tests, compatible with the upstream is here
Link: https://github.com/veeam/blksnap/tree/stable-v2.0
Perhaps it will be useful to you.

>
> Thanks,
> Kuai
>
>>
>> But if I were still solving the problem of saving the filter when rescanning,
>> then it is necessary to take into account the UUID and name of the partition
>> (struct partition_meta_info). It is unacceptable that due to a change in the
>> structure of partitions, the filter is attached to another partition by mistake.
>> The changed() callback would also be good to add so that the filter receives
>> a notification that the block device has been updated.
>>
>> But I'm not sure that this should be done, since if some code is not used in
>> the kernel, then it should not be in the kernel.
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>>>
>>>> Therefore, I will refrain from making changes for now.
>>>>
>>>>>
>>>>> possible solutions I have in mind:
>>>>>
>>>>> 1. Store blkfilter for each partition from bdev_disk_changed() before
>>>>> delete_partition(), and add blkfilter back after add_partition().
>>>>>
>>>>> 2. Store blkfilter from gendisk as a xarray, and protect it by
>>>>> 'open_mutex' like 'part_tbl', block_device can keep the pointer to
>>>>> reference blkfilter so that performance from fast path is ok, and the
>>>>> lifetime of blkfiter can be managed separately.
>>>>>
>>>>>> There was an idea to do filtering at the disk level,
>>>>>> but I abandoned it.
>>>>>> .
>>>>>>
>>>>> I think it's better to do filtering at the partition level as well.
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>> .
>>>>
>>>
>> .
>>
>