Re: [PATCHv2 2/2] block: adjust CFS request expire time

From: Zhaoyang Huang
Date: Tue Feb 20 2024 - 06:56:47 EST


Patchv2 make the adjustment work as a CFS's over-preempted guard which
only take effect for READ

On Tue, Feb 20, 2024 at 7:46 PM zhaoyang.huang
<zhaoyang.huang@xxxxxxxxxx> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
>
> According to current policy, CFS's may suffer involuntary IO-latency by
> being preempted by RT/DL tasks or IRQ since they possess the privilege for
> both of CPU and IO scheduler. This commit introduce an approximate and
> light method to decrease these affection by adjusting the expire time
> via the CFS's proportion among the whole cpu active time.
> The average utilization of cpu's run queue could reflect the historical
> active proportion of different types of task that can be proved valid for
> this goal from belowing three perspective,
>
> 1. All types of sched class's load(util) are tracked and calculated in the
> same way(using a geometric series which known as PELT)
> 2. Keep the legacy policy by NOT adjusting rq's position in fifo_list
> but only make changes over expire_time.
> 3. The fixed expire time(hundreds of ms) is in the same range of cpu
> avg_load's account series(the utilization will be decayed to 0.5 in 32ms)
>
> TaskA
> sched in
> |
> |
> |
> submit_bio
> |
> |
> |
> fifo_time = jiffies + expire
> (insert_request)
>
> TaskB
> sched in
> |
> |
> vfs_xxx
> |
> |preempted by RT,DL,IRQ
> |\
> | This period time is unfair to TaskB's IO request, should be adjust
> |/
> |
> submit_bio
> |
> |
> |
> fifo_time = jiffies + expire * CFS_PROPORTION(rq)
> (insert_request)
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> ---
> change of v2: introduce direction and threshold to make the hack working
> as a guard for CFS's over-preempted.
> ---
> ---
> block/mq-deadline.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..b5aa544d69a3 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -54,6 +54,7 @@ enum dd_prio {
>
> enum { DD_PRIO_COUNT = 3 };
>
> +#define CFS_PROP_THRESHOLD 60
> /*
> * I/O statistics per I/O priority. It is fine if these counters overflow.
> * What matters is that these counters are at least as wide as
> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> struct dd_per_prio *per_prio;
> enum dd_prio prio;
> + int fifo_expire;
>
> lockdep_assert_held(&dd->lock);
>
> @@ -839,8 +841,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>
> /*
> * set expire time and add to fifo list
> + * The expire time is adjusted when current CFS task is
> + * over-preempted by RT/DL/IRQ which is calculated by the
> + * proportion of CFS's activation among whole cpu time during
> + * last several dozen's ms.Whearas, this would NOT affect the
> + * rq's position in fifo_list but only take effect when this
> + * rq is checked for its expire time when at head.
> */
> - rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> + fifo_expire = dd->fifo_expire[data_dir];
> + if (data_dir == DD_READ &&
> + (cfs_prop_by_util(current, 100) < CFS_PROP_THRESHOLD))
> + fifo_expire = cfs_prop_by_util(current, dd->fifo_expire[data_dir]);
> +
> + rq->fifo_time = jiffies + fifo_expire;
> +
> insert_before = &per_prio->fifo_list[data_dir];
> #ifdef CONFIG_BLK_DEV_ZONED
> /*
> --
> 2.25.1
>