Re: [RFC PATCH] blk-throttle: add burst allowance.

From: Khazhismel Kumykov
Date: Mon Dec 18 2017 - 15:40:00 EST


On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
>> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote:
>> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
>> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
>> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >>> >> Allows configuration additional bytes or ios before a throttle is
>> >>> >> triggered.
>> >>> >>
>> >>> >> This allows implementation of a bucket style rate-limit/throttle on a
>> >>> >> block device. Previously, bursting to a device was limited to allowance
>> >>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>> >>> >> refill rate N/slice).
>> >>> >>
>> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>> >>> >> number of bytes/ios that must be depleted before throttling happens. A
>> >>> >> tg that does not deplete this allowance functions as though it has no
>> >>> >> configured limits. tgs earn additional allowance at rate defined by
>> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
>> >>> >> allowance before it gets throttled again.
>> >>> >>
>> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>> >>> >> when all "used" burst allowance would be earned back. trim_slice still
>> >>> >> does progress slice_start as before and decrements *_disp as before, and
>> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >>> >
>> >>> > Can you describe why we need this? It would be great if you can describe the
>> >>> > usage model and an example. Does this work for io.low/io.max or both?
>> >>> >
>> >>> > Thanks,
>> >>> > Shaohua
>> >>> >
>> >>>
>> >>> Use case that brought this up was configuring limits for a remote
>> >>> shared device. Bursting beyond io.max is desired but only for so much
>> >>> before the limit kicks in, afterwards with sustained usage throughput
>> >>> is capped. (This proactively avoids remote-side limits). In that case
>> >>> one would configure in a root container io.max + io.burst, and
>> >>> configure low/other limits on descendants sharing the resource on the
>> >>> same node.
>> >>>
>> >>> With this patch, so long as tg has not dispatched more than the burst,
>> >>> no limit is applied at all by that tg, including limit imposed by
>> >>> io.low in tg_iops_limit, etc.
>> >>
>> >> I'd appreciate if you can give more details about the 'why'. 'configuring
>> >> limits for a remote shared device' doesn't justify the change.
>> >
>> > This is to configure a bursty workload (and associated device) with
>> > known/allowed expected burst size, but to not allow full utilization
>> > of the device for extended periods of time for QoS. During idle or low
>> > use periods the burst allowance accrues, and then tasks can burst well
>> > beyond the configured throttle up to the limit, afterwards is
>> > throttled. A constant throttle speed isn't sufficient for this as you
>> > can only burst 1 slice worth, but a limit of sorts is desirable for
>> > preventing over utilization of the shared device. This type of limit
>> > is also slightly different than what i understand io.low does in local
>> > cases in that tg is only high priority/unthrottled if it is bursty,
>> > and is limited with constant usage
>> >
>> > Khazhy
>>
>> Hi Shaohua,
>>
>> Does this clarify the reason for this patch? Is this (or something
>> similar) a good fit for inclusion in blk-throttle?
>>
>
> So does this brust have to be per cgroup. I mean if thortl_slice was
> configurable, that will allow to control the size of burst. (Just that
> it will be for all cgroups). If that works, that might be a simpler
> solution.
>
> Vivek

The purpose for this configuration vs. increasing throtl_slice is the
behavior when the burst runs out. io/bytes allowance is given in
intervals of throtl_slice, so for long throtl_slice for those devices
that exceed the limit will see extended periods with no IO, rather
than at throttled speed. With this once burst is run out, since the
burst allowance is on top of the throttle, the device can continue to
be used more smoothly at the configured throttled speed. For this we
do want a throttle group with both the "steady state" rate + the burst
amount, and we get cgroup support with that.

I notice with cgroupv2 io, it seems no longer to configure a
device-wide throttle group e.g. on the root cgroup. (and putting
restrictions on root cgroup isn't an option) For something like this,
it does make sense to want to configure just for the device, vs. per
cgroup, perhaps there is somewhere better it would fit than as cgroup
option? perhaps have configuration on device node for a throttle group
for the device?

Khazhy

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature