Re: [PATCH] mmc:mmc-hsq:use fifo to dispatch mmc_request

From: Michael Wu
Date: Mon Nov 21 2022 - 01:19:08 EST


On 11/18/2022 7:43 PM, Wenchao Chen wrote:
On Fri, Nov 18, 2022 at 1:52 PM Michael Wu <michael@xxxxxxxxxxxxxxxxx> wrote:

Current next_tag selection will cause a large delay in some requests and
destroy the scheduling results of the block scheduling layer. Because the
issued mrq tags cannot ensure that each time is sequential, especially when
the IO load is heavy. In the fio performance test, we found that 4k random
read data was sent to mmc_hsq to start calling request_atomic It takes
nearly 200ms to process the request, while mmc_hsq has processed thousands
of other requests. So we use fifo here to ensure the first in, first out
feature of the request and avoid adding additional delay to the request.


Hi Michael
Is the test device an eMMC?
Could you share the fio test command if you want?
Can you provide more logs?

Hi Wenchao,
Yes, the tested device is emmc.
The test command we used is `./fio -name=Rand_Read_IOPS_Test -group_reporting -rw=random -bs=4K -numjobs=8 -directory=/data/data -size=1G -io_size=64M -nrfiles=1 -direct=1 -thread && rm /data/Rand_Read_IOPS_Test *`, which replaces the io performance random read performance test of androidbench, and the file size is set to 1G, 8 thread test configuration. Where /data uses f2fs and /data/data is a file encrypted path.

After enabling the hsq configuration, we can clearly see from below fio test log that the minimum value of random reading is 3175 iops and the maximum value is 8554iops, and the maximum delay of io completion is about 200ms.
```
clat percentiles (usec):
| 1.00th=[ 498], 5.00th=[ 865], 10.00th=[ 963], 20.00th=[ 1045],
| 30.00th=[ 1090], 40.00th=[ 1139], 50.00th=[ 1172], 60.00th=[ 1221],
| 70.00th=[ 1254], 80.00th=[ 1319], 90.00th=[ 1401], 95.00th=[ 1614],
| 99.00th=[ 2769], 99.50th=[ 3589], 99.90th=[ 31589], 99.95th=[ 66323],
| 99.99th=[200279]
bw ( KiB/s): min=12705, max=34225, per=100.00%, avg=23931.79, stdev=497.40, samples=345
iops : min= 3175, max= 8554, avg=5981.67, stdev=124.38, samples=345
```


```
clat percentiles (usec):
| 1.00th=[ 799], 5.00th=[ 938], 10.00th=[ 963], 20.00th=[ 979],
| 30.00th=[ 996], 40.00th=[ 1004], 50.00th=[ 1020], 60.00th=[ 1045],
| 70.00th=[ 1074], 80.00th=[ 1106], 90.00th=[ 1172], 95.00th=[ 1237],
| 99.00th=[ 1450], 99.50th=[ 1516], 99.90th=[ 1762], 99.95th=[ 2180],
| 99.99th=[ 9503]
bw ( KiB/s): min=29200, max=30944, per=100.00%, avg=30178.91, stdev=53.45, samples=272
iops : min= 7300, max= 7736, avg=7544.62, stdev=13.38, samples=272
```
When NOT enabling hsq, the minimum value of random reading is 7300 iops and the maximum value is 7736 iops, and the maximum delay of io is only 9 ms. Finally, we added debug to the mmc driver. The reason for locating the 200ms delay of hsq is due to the next tag selection of hsq.

---
Michael Wu