Re: [linus:master] [io_uring] caec5ebe77: stress-ng.io-uring.ops_per_sec -33.1% regression

From: Jens Axboe
Date: Wed Aug 16 2023 - 15:28:13 EST


On 8/16/23 9:32 AM, Yin, Fengwei wrote:
> Hi Jens,
>
> On 7/6/2023 3:29 PM, kernel test robot wrote:
>>
>>
>> Hello,
>>
>> kernel test robot noticed a -33.1% regression of stress-ng.io-uring.ops_per_sec on:
>>
>>
>> commit: caec5ebe77f97d948dcf46f07d622bda7f1f6dfd ("io_uring: rely solely on FMODE_NOWAIT")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> NOTE:
>> one thing we want to mention is we reported
>> "[linux-next:master] [io_uring] caec5ebe77: stress-ng.io-uring.ops_per_sec 32.5% improvement"
>> on
>> https://lore.kernel.org/all/202306061247.510feb23-oliver.sang@xxxxxxxxx/
>> however, by further checking, we realized the test machine ran in abnormal
>> status at that time due to BIOS issue, which we finally fixed recently.
>> please just ignore that report upon linus-next/master.
>>
>>
>> testcase: stress-ng
>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>> parameters:
>>
>> nr_threads: 100%
>> testtime: 60s
>> class: pts
>> test: io-uring
>> cpufreq_governor: performance
>>
>>
>> In addition to that, the commit also has significant impact on the following tests:
>>
>> +------------------+-------------------------------------------------------------------------------------------------+
>> | testcase: change | stress-ng: stress-ng.io-uring.ops_per_sec -1.3% regression |
>> | test machine | 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory |
>> | test parameters | class=pts |
>> | | cpufreq_governor=performance |
>> | | nr_threads=100% |
>> | | test=io-uring |
>> | | testtime=60s |
>> +------------------+-------------------------------------------------------------------------------------------------+
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
>> | Closes: https://lore.kernel.org/oe-lkp/202307060958.3594f22f-oliver.sang@xxxxxxxxx
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> sudo bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>> sudo bin/lkp run generated-yaml-file
>>
>> # if come across any failure that blocks the test,
>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>> pts/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/io-uring/stress-ng/60s
>>
>> commit:
>> e9833d8701 ("block: mark bdev files as FMODE_NOWAIT if underlying device supports it")
>> caec5ebe77 ("io_uring: rely solely on FMODE_NOWAIT")
> About this regression, some findings in my side:
> - LKP use initrd to do stress-ng testing. That means the stress-ng is using the
> file in initrd as test file.
> - The commit caec5ebe77 makes io_uring rely on FMODE_NOWAIT. But the tmpfs and
> the file on initrd doesn't has that bit set. I suppose this is why we can see
> this regression with LKP which is using initrd.
>
> I tried to let stress-ng.io_uring use the file on tmpfs and also can see
> signifcient performance change with this commit.
>
> - If apply following change based on commit caec5ebe77:
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 7c426584e35a..e755697c773f 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1765,6 +1765,17 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
> */
> static bool __io_file_supports_nowait(struct file *file, umode_t mode)
> {
> + if (S_ISREG(mode)) {
> + struct block_device *bdev = file->f_inode->i_sb->s_bdev;
> +
> + if (IS_ENABLED(CONFIG_BLOCK) &&
> + (!bdev || bdev_nowait(bdev)) &&
> + !io_is_uring_fops(file))
> + return true;
> +
> + return false;
> + }
> +
> /* any ->read/write should understand O_NONBLOCK */
> if (file->f_flags & O_NONBLOCK)
> return true;
>
> The regression is gone with LKP.
>
> - If apply following change based on commit caec5ebe77:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e40a08c5c6d7..e9df664f0063 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3962,9 +3962,16 @@ const struct address_space_operations shmem_aops = {
> };
> EXPORT_SYMBOL(shmem_aops);
>
> +static int shmem_file_open(struct inode *inode, struct file *filp)
> +{
> + filp->f_mode |= FMODE_NOWAIT;
> +
> + return generic_file_open(inode, filp);
> +}
> +
> static const struct file_operations shmem_file_operations = {
> .mmap = shmem_mmap,
> - .open = generic_file_open,
> + .open = shmem_file_open,
> .get_unmapped_area = shmem_get_unmapped_area,
> #ifdef CONFIG_TMPFS
> .llseek = shmem_file_llseek,
>
> The performance change when running stress-ng.io_uring with testing file
> in tmpfs is gone.
>
> This is just the information FYI. I may miss something obviously here. Thanks.

This actually highlighted a problem with the old nowait logic, in that
it assumed !bdev would mean that nowait was fine. Looking at shmem, we
definitely need IOCB_NOWAIT handling in there to make that safe. So the
old code was buggy, and conversely, we can't also just add the
FMODE_NOWAIT without first making those improvements to shmem first.

Basically you'd want to ensure that the read_iter and write_iter paths
honor IOCB_NOWAIT. Once that's done, then FMODE_NOWAIT can indeed be set
in the open helper.

I might take a stab at this, but I'm out sick right now so don't think
it'd be cohesive if I did it right now.

--
Jens Axboe