Re: [PATCH BUGFIX V2] block, bfq: update wr_busy_queues if needed on a queue split

From: Paolo Valente
Date: Wed Jun 28 2017 - 09:44:35 EST



> Il giorno 28 giu 2017, alle ore 14:42, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>
> On 06/27/2017 11:39 PM, Paolo Valente wrote:
>>
>>> Il giorno 27 giu 2017, alle ore 20:29, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>
>>> On 06/27/2017 12:27 PM, Paolo Valente wrote:
>>>>
>>>>> Il giorno 27 giu 2017, alle ore 16:41, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>>>
>>>>> On 06/27/2017 12:09 AM, Paolo Valente wrote:
>>>>>>
>>>>>>> Il giorno 19 giu 2017, alle ore 13:43, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
>>>>>>>
>>>>>>> This commit fixes a bug triggered by a non-trivial sequence of
>>>>>>> events. These events are briefly described in the next two
>>>>>>> paragraphs. The impatiens, or those who are familiar with queue
>>>>>>> merging and splitting, can jump directly to the last paragraph.
>>>>>>>
>>>>>>> On each I/O-request arrival for a shared bfq_queue, i.e., for a
>>>>>>> bfq_queue that is the result of the merge of two or more bfq_queues,
>>>>>>> BFQ checks whether the shared bfq_queue has become seeky (i.e., if too
>>>>>>> many random I/O requests have arrived for the bfq_queue; if the device
>>>>>>> is non rotational, then random requests must be also small for the
>>>>>>> bfq_queue to be tagged as seeky). If the shared bfq_queue is actually
>>>>>>> detected as seeky, then a split occurs: the bfq I/O context of the
>>>>>>> process that has issued the request is redirected from the shared
>>>>>>> bfq_queue to a new non-shared bfq_queue. As a degenerate case, if the
>>>>>>> shared bfq_queue actually happens to be shared only by one process
>>>>>>> (because of previous splits), then no new bfq_queue is created: the
>>>>>>> state of the shared bfq_queue is just changed from shared to non
>>>>>>> shared.
>>>>>>>
>>>>>>> Regardless of whether a brand new non-shared bfq_queue is created, or
>>>>>>> the pre-existing shared bfq_queue is just turned into a non-shared
>>>>>>> bfq_queue, several parameters of the non-shared bfq_queue are set
>>>>>>> (restored) to the original values they had when the bfq_queue
>>>>>>> associated with the bfq I/O context of the process (that has just
>>>>>>> issued an I/O request) was merged with the shared bfq_queue. One of
>>>>>>> these parameters is the weight-raising state.
>>>>>>>
>>>>>>> If, on the split of a shared bfq_queue,
>>>>>>> 1) a pre-existing shared bfq_queue is turned into a non-shared
>>>>>>> bfq_queue;
>>>>>>> 2) the previously shared bfq_queue happens to be busy;
>>>>>>> 3) the weight-raising state of the previously shared bfq_queue happens
>>>>>>> to change;
>>>>>>> the number of weight-raised busy queues changes. The field
>>>>>>> wr_busy_queues must then be updated accordingly, but such an update
>>>>>>> was missing. This commit adds the missing update.
>>>>>>>
>>>>>>
>>>>>> Hi Jens,
>>>>>> any idea of the possible fate of this fix?
>>>>>
>>>>> I sort of missed this one. It looks trivial enough for 4.12, or we
>>>>> can defer until 4.13. What do you think?
>>>>>
>>>>
>>>> It should actually be something trivial, and hopefully correct,
>>>> because a further throughput improvement (for BFQ), which depends on
>>>> this fix, is now working properly, and we didn't see any regression so
>>>> far. In addition, since this improvement is virtually ready for
>>>> submission, further steps may be probably easier if this fix gets in
>>>> sooner (whatever the luck of the improvement will be).
>>>
>>> OK, let's queue it up for 4.13 then.
>>>
>>
>> My arguments was in favor of 4.12 actually. Maybe you did mean 4.12
>> here?
>
> You were talking about further improvements and new development on top
> of this, so I assumed you meant 4.13. However, further development is
> not the main criteria or concern for whether this fix should go into
> 4.12 or not.

Ok, thanks for your explanation and patience.

> The main concern is if this fixes something that is crucial
> to have in 4.12. It's late in the cycle, I'd rather not push anything
> that isn't a regression fix at this point.
>

Hard to assess precisely how crucial this is. Certainly it fixes a
regression. The practical, negative effects of this regression are
systematic when one tries to add the throughput improvement I
mentioned: the improvement almost never works. If BFQ is used as it
is, then negative effects on throughput are less likely to happen.

I hope that this piece of information is somehow useful for your
decision.

Thanks,
Paolo

> --
> Jens Axboe