Re: [PATCHv3 1/1] block: introduce content activity based ioprio

From: Zhaoyang Huang
Date: Fri Jan 26 2024 - 04:09:40 EST


loop more mm and fs guys for more comments

On Thu, Jan 25, 2024 at 3:22 PM zhaoyang.huang
<zhaoyang.huang@xxxxxxxxxx> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
>
> Currently, request's ioprio are set via task's schedule priority(when no
> blkcg configured), which has high priority tasks possess the privilege on
> both of CPU and IO scheduling.
> This commit works as a hint of original policy by promoting the request ioprio
> based on the page/folio's activity. The original idea comes from LRU_GEN
> which provides more precised folio activity than before. This commit try
> to adjust the request's ioprio when certain part of its folios are hot,
> which indicate that this request carry important contents and need be
> scheduled ealier.
>
> This commit is verified on a v6.6 6GB RAM android14 system via 4 test cases
> by changing the bio_add_page/folio API in ext4 and f2fs.
>
> Case 1:
> script[a] which get significant improved fault time as expected[b]
> where dd's cost also shrink from 55s to 40s.
> (1). fault_latency.bin is an ebpf based test tool which measure all task's
> iowait latency during page fault when scheduled out/in.
> (2). costmem generate page fault by mmaping a file and access the VA.
> (3). dd generate concurrent vfs io.
>
> [a]
> ./fault_latency.bin 1 5 > /data/dd_costmem &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> dd if=/dev/block/sda of=/data/ddtest bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest1 bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest2 bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest3 bs=1024 count=2048000
> [b]
> mainline commit
> io wait 836us 156us
>
> Case 2:
> fio -filename=/dev/block/by-name/userdata -rw=randread -direct=0 -bs=4k -size=2000M -numjobs=8 -group_reporting -name=mytest
> mainline: 513MiB/s
> READ: bw=531MiB/s (557MB/s), 531MiB/s-531MiB/s (557MB/s-557MB/s), io=15.6GiB (16.8GB), run=30137-30137msec
> READ: bw=543MiB/s (569MB/s), 543MiB/s-543MiB/s (569MB/s-569MB/s), io=15.6GiB (16.8GB), run=29469-29469msec
> READ: bw=474MiB/s (497MB/s), 474MiB/s-474MiB/s (497MB/s-497MB/s), io=15.6GiB (16.8GB), run=33724-33724msec
> READ: bw=535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), io=15.6GiB (16.8GB), run=29928-29928msec
> READ: bw=523MiB/s (548MB/s), 523MiB/s-523MiB/s (548MB/s-548MB/s), io=15.6GiB (16.8GB), run=30617-30617msec
> READ: bw=492MiB/s (516MB/s), 492MiB/s-492MiB/s (516MB/s-516MB/s), io=15.6GiB (16.8GB), run=32518-32518msec
> READ: bw=533MiB/s (559MB/s), 533MiB/s-533MiB/s (559MB/s-559MB/s), io=15.6GiB (16.8GB), run=29993-29993msec
> READ: bw=524MiB/s (550MB/s), 524MiB/s-524MiB/s (550MB/s-550MB/s), io=15.6GiB (16.8GB), run=30526-30526msec
> READ: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=15.6GiB (16.8GB), run=30269-30269msec
> READ: bw=449MiB/s (471MB/s), 449MiB/s-449MiB/s (471MB/s-471MB/s), io=15.6GiB (16.8GB), run=35629-35629msec
>
> commit: 633MiB/s
> READ: bw=668MiB/s (700MB/s), 668MiB/s-668MiB/s (700MB/s-700MB/s), io=15.6GiB (16.8GB), run=23952-23952msec
> READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=15.6GiB (16.8GB), run=27164-27164msec
> READ: bw=638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), io=15.6GiB (16.8GB), run=25071-25071msec
> READ: bw=714MiB/s (749MB/s), 714MiB/s-714MiB/s (749MB/s-749MB/s), io=15.6GiB (16.8GB), run=22409-22409msec
> READ: bw=600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), io=15.6GiB (16.8GB), run=26669-26669msec
> READ: bw=592MiB/s (621MB/s), 592MiB/s-592MiB/s (621MB/s-621MB/s), io=15.6GiB (16.8GB), run=27036-27036msec
> READ: bw=691MiB/s (725MB/s), 691MiB/s-691MiB/s (725MB/s-725MB/s), io=15.6GiB (16.8GB), run=23150-23150msec
> READ: bw=569MiB/s (596MB/s), 569MiB/s-569MiB/s (596MB/s-596MB/s), io=15.6GiB (16.8GB), run=28142-28142msec
> READ: bw=563MiB/s (590MB/s), 563MiB/s-563MiB/s (590MB/s-590MB/s), io=15.6GiB (16.8GB), run=28429-28429msec
> READ: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=15.6GiB (16.8GB), run=22478-22478msec
>
> Case 3:
> This commit is also verified by the case of launching camera APP which is
> usually considered as heavy working load on both of memory and IO, which
> shows 12%-24% improvement.
>
> ttl = 0 ttl = 50 ttl = 100
> mainline 2267ms 2420ms 2316ms
> commit 1992ms 1806ms 1998ms
>
> case 4:
> androbench has no improvment as well as regression which supposed to be
> its test time is short which MGLRU hasn't take effect yet.
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> ---
> change of v2: calculate page's activity via helper function
> change of v3: solve layer violation by move API into mm
> change of v4: keep block clean by removing the page related API
> ---
> ---
> include/linux/act_ioprio.h | 62 +++++++++++++++++++++++++++++++++++++
> include/uapi/linux/ioprio.h | 44 +++++++++++++++++++++++---
> mm/Kconfig | 8 +++++
> 3 files changed, 110 insertions(+), 4 deletions(-)
> create mode 100644 include/linux/act_ioprio.h
>
> diff --git a/include/linux/act_ioprio.h b/include/linux/act_ioprio.h
> new file mode 100644
> index 000000000000..8cfb3df270bd
> --- /dev/null
> +++ b/include/linux/act_ioprio.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _ACT_IOPRIO_H
> +#define _ACT_IOPRIO_H
> +
> +#include <linux/bio.h>
> +
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t len,
> + size_t off)
> +{
> + int class, level, hint, activity;
> +
> + if (len > UINT_MAX || off > UINT_MAX)
> + return false;
> +
> + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio);
> + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio);
> + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio);
> +
> + activity += (bio->bi_vcnt + 1 <= IOPRIO_NR_ACTIVITY &&
> + PageWorkingset(&folio->page)) ? 1 : 0;
> + if (activity >= bio->bi_vcnt / 2)
> + class = IOPRIO_CLASS_RT;
> + else if (activity >= bio->bi_vcnt / 4)
> + class = max(IOPRIO_PRIO_CLASS(get_current_ioprio()), IOPRIO_CLASS_BE);
> +
> + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity);
> +
> + return bio_add_page(bio, &folio->page, len, off) > 0;
> +}
> +
> +int BIO_ADD_PAGE(struct bio *bio, struct page *page,
> + unsigned int len, unsigned int offset)
> +{
> + int class, level, hint, activity;
> +
> + if (bio_add_page(bio, page, len, offset) > 0) {
> + class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> + level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio);
> + hint = IOPRIO_PRIO_HINT(bio->bi_ioprio);
> + activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio);
> + activity += (bio->bi_vcnt <= IOPRIO_NR_ACTIVITY && PageWorkingset(page)) ? 1 : 0;
> + bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity);
> + }
> +
> + return len;
> +}
> +#else
> +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t len,
> + size_t off)
> +{
> + return bio_add_folio(bio, folio, len, off);
> +}
> +
> +int BIO_ADD_PAGE(struct bio *bio, struct page *page,
> + unsigned int len, unsigned int offset)
> +{
> + return bio_add_page(bio, page, len, offset);
> +}
> +#endif
> +#endif
> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
> index bee2bdb0eedb..f933af54d71e 100644
> --- a/include/uapi/linux/ioprio.h
> +++ b/include/uapi/linux/ioprio.h
> @@ -71,12 +71,24 @@ enum {
> * class and level.
> */
> #define IOPRIO_HINT_SHIFT IOPRIO_LEVEL_NR_BITS
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +#define IOPRIO_HINT_NR_BITS 3
> +#else
> #define IOPRIO_HINT_NR_BITS 10
> +#endif
> #define IOPRIO_NR_HINTS (1 << IOPRIO_HINT_NR_BITS)
> #define IOPRIO_HINT_MASK (IOPRIO_NR_HINTS - 1)
> #define IOPRIO_PRIO_HINT(ioprio) \
> (((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_HINT_MASK)
>
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +#define IOPRIO_ACTIVITY_SHIFT (IOPRIO_HINT_NR_BITS + IOPRIO_LEVEL_NR_BITS)
> +#define IOPRIO_ACTIVITY_NR_BITS 7
> +#define IOPRIO_NR_ACTIVITY (1 << IOPRIO_ACTIVITY_NR_BITS)
> +#define IOPRIO_ACTIVITY_MASK (IOPRIO_NR_ACTIVITY - 1)
> +#define IOPRIO_PRIO_ACTIVITY(ioprio) \
> + (((ioprio) >> IOPRIO_ACTIVITY_SHIFT) & IOPRIO_ACTIVITY_MASK)
> +#endif
> /*
> * I/O hints.
> */
> @@ -104,24 +116,48 @@ enum {
>
> #define IOPRIO_BAD_VALUE(val, max) ((val) < 0 || (val) >= (max))
>
> +#ifndef CONFIG_CONTENT_ACT_BASED_IOPRIO
> /*
> * Return an I/O priority value based on a class, a level and a hint.
> */
> static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> - int priohint)
> + int priohint)
> {
> if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> - IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> - IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> + IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> + IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
>
> return (prioclass << IOPRIO_CLASS_SHIFT) |
> (priohint << IOPRIO_HINT_SHIFT) | priolevel;
> }
> -
> #define IOPRIO_PRIO_VALUE(prioclass, priolevel) \
> ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE)
> #define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \
> ioprio_value(prioclass, priolevel, priohint)
> +#else
> +/*
> + * Return an I/O priority value based on a class, a level and a hint.
> + */
> +static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> + int priohint, int activity)
> +{
> + if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> + IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> + IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS) ||
> + IOPRIO_BAD_VALUE(activity, IOPRIO_NR_ACTIVITY))
> + return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
>
> + return (prioclass << IOPRIO_CLASS_SHIFT) |
> + (activity << IOPRIO_ACTIVITY_SHIFT) |
> + (priohint << IOPRIO_HINT_SHIFT) | priolevel;
> +}
> +
> +#define IOPRIO_PRIO_VALUE(prioclass, priolevel) \
> + ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE, 0)
> +#define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \
> + ioprio_value(prioclass, priolevel, priohint, 0)
> +#define IOPRIO_PRIO_VALUE_ACTIVITY(prioclass, priolevel, priohint, activity) \
> + ioprio_value(prioclass, priolevel, priohint, activity)
> +#endif
> #endif /* _UAPI_LINUX_IOPRIO_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 264a2df5ecf5..e0e5a5a44ded 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1240,6 +1240,14 @@ config LRU_GEN_STATS
> from evicted generations for debugging purpose.
>
> This option has a per-memcg and per-node memory overhead.
> +
> +config CONTENT_ACT_BASED_IOPRIO
> + bool "Enable content activity based ioprio"
> + depends on LRU_GEN
> + default n
> + help
> + This item enable the feature of adjust bio's priority by
> + calculating its content's activity.
> # }
>
> config ARCH_SUPPORTS_PER_VMA_LOCK
> --
> 2.25.1
>