Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl

From: Tejun Heo
Date: Thu Jan 05 2023 - 16:39:39 EST


Hello, Michal.

On Thu, Jan 05, 2023 at 08:22:47PM +0100, Michal Koutný wrote:
> On Thu, Jan 05, 2023 at 07:35:59AM -1000, Tejun Heo <tj@xxxxxxxxxx> wrote:
> > Hard limits tend to make this sort of problems a lot more pronounced because
> > the existing mechanisms tend to break down for the users which are severely
> > throttled down even while the device as a whole is fairly idle. cpu.max
> > often triggers severe priority inversions too, so it isn't too surprising
> > that people hit severe priority inversion issues w/ io.max.
>
> To be on the same page:
> 1) severe PI == priority inversion across cgroups (progated e.g. via
> global locks (as with cpu.max) or FS journal (as with io.max)),

Another important inversion vector is memory reclaim as writeback operations
get trapped in blk-throttle and the system can pile on waiting for clean
pages.

> 2) ordinary PI == priority inversion contained within a single cgroup,
> i.e. no different from an under-provisioned system.

I didn't use the term in a precise manner but yeah both can be problematic.
The former a lot more so.

> The reported issue sounds like 2) but even with the separated queues 1)
> is still possible :-/

Yeah, definitely.

> > Another problem with blk-throttle is that it doesn't prioritize shared IOs
> > identified by bio_issue_as_root_blkg() like iolatency and iocost do, so
> > there can be very severe priority inversions when e.g. journal commit gets
> > trapped in a low priority cgroup further exacerbating issues like this.
>
> Thanks for the broader view. So the separated queues are certainly an
> improvement but ultimately a mechanism based on bio_issue_as_root_blkg()
> predicate and deferred throttling would be better? Or is permanent limit
> enforcement more important?

Generally true but the specific scenario raised by Jinke is rather unusual
and isn't covered by issue_as_root mechanism as it doesn't promote REQ_SYNC
data writes. Then again, this usually isn't a problem as in most cases dirty
throttling through writeback should stave off severe starvations.

blk-throttle is pretty outdated and kinda broken and separating out sync
writes isn't gonna solve most of its problems. However, it will help in some
cases like the one described by Jinke and can sometimes lessen wider scope
inversions too. Given the partial nature, I don't feel too enthusiastic but
at the same time can't find good reasons to reject either. I don't know. I
don't feel too strong either way.

On a tangential note, I've been thinking about implementing io.max on top of
iocost so that each cgroup can just configure the max % of total device IO
capacity that's allowed for it, which should be a lot easier to use and has
many of the problems already solved.

Thanks.

--
tejun