Re: [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.]

From: NeilBrown
Date: Tue Nov 21 2017 - 18:03:32 EST


On Tue, Nov 21 2017, Mikulas Patocka wrote:

> On Tue, 21 Nov 2017, Mike Snitzer wrote:
>
>> On Tue, Nov 21 2017 at 7:43am -0500,
>> Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>>
>> > Decided it a better use of my time to review and then hopefully use the
>> > block-core's bio splitting infrastructure in DM. Been meaning to do
>> > that for quite a while anyway. This mail from you just made it all the
>> > more clear that needs doing:
>> > https://www.redhat.com/archives/dm-devel/2017-September/msg00098.html
>> >
>> > So I will start here on this patch you proposed:
>> > https://www.redhat.com/archives/dm-devel/2017-September/msg00091.html
>> > (of note, this patch slipped through the cracks because I was recovering
>> > from injury when it originally came through).
>> >
>> > Once DM is using q->bio_split I'll come back to this patch (aka
>> > "[1]") as a starting point for the follow-on work to remove DM's use of
>> > BIOSET_NEED_RESCUER:
>> > https://www.redhat.com/archives/dm-devel/2017-August/msg00315.html
>>
>> Hey Neil,
>>
>> Good news! All your code works ;)
>>
>> (well after 1 fixup due to a cut-n-paste bug.. the code you added to
>> dm_wq_work() to process the md->rescued bio_list was operating on
>> the md->deferred bio_list due to cut-n-paste from code you copied from
>> just below it)
>>
>> I split your code out some to make it more reviewable. I also tweaked
>> headers accordingly.
>>
>> Please see this branch (which _will_ get rebased between now and the
>> 4.16 merge window):
>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.16
>>
>> I successfully tested these changes using Mikulas' test program that
>> reproduces the snapshot deadlock:
>> https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
>>
>> I'll throw various other DM testsuites at it to verify they all look
>> good (e.g. thinp, cache, multipath).
>>
>> I'm open to all suggestions about changes you'd like to see (either to
>> these patches or anything you'd like to layer ontop of them).
>>
>> Thanks for all your work, much appreciated!
>> Mike
>
> This is not correct:

Thanks for your review!

>
> 2206 static void dm_wq_work(struct work_struct *work)
> 2207 {
> 2208 struct mapped_device *md = container_of(work, struct mapped_device, work);
> 2209 struct bio *bio;
> 2210 int srcu_idx;
> 2211 struct dm_table *map;
> 2212
> 2213 if (!bio_list_empty(&md->rescued)) {
> 2214 struct bio_list list;
> 2215 spin_lock_irq(&md->deferred_lock);
> 2216 list = md->rescued;
> 2217 bio_list_init(&md->rescued);
> 2218 spin_unlock_irq(&md->deferred_lock);
> 2219 while ((bio = bio_list_pop(&list)))
> 2220 generic_make_request(bio);
> 2221 }
> 2222
> 2223 map = dm_get_live_table(md, &srcu_idx);
> 2224
> 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> 2226 spin_lock_irq(&md->deferred_lock);
> 2227 bio = bio_list_pop(&md->deferred);
> 2228 spin_unlock_irq(&md->deferred_lock);
> 2229
> 2230 if (!bio)
> 2231 break;
> 2232
> 2233 if (dm_request_based(md))
> 2234 generic_make_request(bio);
> 2235 else
> 2236 __split_and_process_bio(md, map, bio);
> 2237 }
> 2238
> 2239 dm_put_live_table(md, srcu_idx);
> 2240 }
>
> You can see that if we are in dm_wq_work in __split_and_process_bio, we
> will not process md->rescued list.

Correct, but md->rescued will be empty, or irrelevant.
The first section of dm_wq_work ensures ->rescued is empty.
When __split_and_process_bio() calls generic_make_request() (indirectly
through one or more targets) they will not be recursive calls,
so nothing will be added to current->bio_list[0] and nothing
will be moved to md->rescued. Each generic_make_request() will
completely submit the request in the lower level devel.

Some other thread could call generic_make_request on this dm device and
result in bios appeared on md->rescued. These bios could only be a
problem if something that __split_and_process_bio calls might wait for
them. I don't think that happens (at least I don't think it should...).


>
> The processing of md->rescued is also wrong - bios for different devices
> must be offloaded to different helper threads, so that processing a bio
> for a lower device doesn't depend on processing a bio for a higher device.
> If you offload all the bios on current->bio_list to the same thread, the
> bios still depend on each other and the deadlock will still happen.

bios on current->bio_list[0] are not allowed to depend on each other
except that later bios can depend on earlier bios. They are all for a
lower-level device and should be largely independent.
The sorting that generic_make_request now does ensure that a bio for a
higher level device is never processed when a bio for a lower level
device, that it might depend on, is stuck on current->bio_list.

So I don't think there is a problem here.
Do you find this argument at all convincing?

Thanks,
NeilBrown

>
> Mikulas
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

Attachment: signature.asc
Description: PGP signature