Re: [RFC v1 0/1] nvme testsuite runtime optimization

From: Daniel Wagner
Date: Wed Apr 19 2023 - 07:13:22 EST


On Wed, Apr 19, 2023 at 12:50:10PM +0300, Sagi Grimberg wrote:
>
> > > While testing the fc transport I got a bit tired of wait for the I/O jobs to
> > > finish. Thus here some runtime optimization.
> > >
> > > With a small/slow VM I got following values:
> > >
> > > with 'optimizations'
> > > loop:
> > > real 4m43.981s
> > > user 0m17.754s
> > > sys 2m6.249s
>
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.

first run was with loop, second one with rdma:

nvme/002 (create many subsystems and test discovery) [not run]
runtime 82.089s ...
nvme_trtype=rdma is not supported in this test

nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
runtime 39.948s ...
nvme_trtype=rdma is not supported in this test
nvme/017 (create/delete many file-ns and test discovery) [not run]
runtime 40.237s ...

nvme/047 (test different queue types for fabric transports) [passed]
runtime ... 13.580s
nvme/048 (Test queue count changes on reconnect) [passed]
runtime ... 6.287s

82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
optimization didn't work there...

> > Those jobs are meant to be run for at least 1G to establish
> > confidence on the data set and the system under test since SSDs
> > are in TBs nowadays and we don't even get anywhere close to that,
> > with your suggestion we are going even lower ...
>
> Where does the 1G boundary coming from?

No idea, it just the existing hard coded values. I guess it might be from
efa06fcf3c83 ("loop: test partition scanning") which was the first real test
case (according the logs).

> > we cannot change the dataset size for slow VMs, instead add
> > a command line argument and pass it to tests e.g.
> > nvme_verification_size=XXX similar to nvme_trtype but don't change
> > the default values which we have been testing for years now
> >
> > Testing is supposed to be time consuming especially verification jobs..
>
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

Good point, I'll make it configurable. What is a good small default then? There
are some test cases in loop which allocated a 1M file. That's propably too
small.