Re: [PATCH blktests v3 09/12] common/fio: Limit number of random jobs

From: Chaitanya Kulkarni
Date: Thu May 04 2023 - 01:16:46 EST


On 5/3/23 04:01, Daniel Wagner wrote:
> On Wed, May 03, 2023 at 09:41:37AM +0000, Chaitanya Kulkarni wrote:
>> On 5/3/23 01:02, Daniel Wagner wrote:
>>> Limit the number of random threads to 32 for big machines. This still
>>> gives enough randomness but limits the resource usage.
>>>
>>> Signed-off-by: Daniel Wagner <dwagner@xxxxxxx>
>>> ---
>> I don't think we should change this, the point of all the tests is
>> to not limit the resources but use threads at least equal to
>> $(nproc), see recent patches from lenovo they have 448 cores,
>> limiting 32 is < 10% CPUs and that is really small number for
>> a large machine if we decide to run tests on that machine ...
> I just wonder how handle the limits for the job size. Hannes asked to limit it
> to 32 CPUs so that the job size doesn't get small, e.g. nvme_img_size=16M job
> size per job with 448 CPUs is roughly 36kB. Is this good, bad or does it even
> make sense? I don't know.

16M is very small number ..

from my experience with smaller I/O sizes we don't see the lokdeps
that we see with the large I/O sizes hence it is a bad idea to use small
I/O sizes and limiting the jobs to hard coded 32 number ...

> My question is what should the policy be? Should we reject configuration which
> try to run too small jobs sizes? Reject anything below 1M for example? Or is
> there a metric which we could as base for a limit calculation (disk geometry)?

the basic requirement here is we need to run the I/O from every processor,
so let's keep --numjobs=($nproc) constant now and let the user set job
size..
in this particular case for NVMe we set the size 1G and that is
sufficient since
numbjobs are set to nproc and with this series user can set the size
based on
a particular arch ...

See [1] if you are interested in how to quantify small or large job size.

For this series to merge let's keep is simple and not worry about erroring
out on a particular job size but just keeping the nproc as it is ...

-ck

Ideally in past what I've done is  :-
1. Accept the % of the CPU cores that we want to keep it busy.
2. Accept the % of the disk space we want to exercise test.
3. Use the combination of the #1 and #2 to spread out the
   job size across the number of jobs.

with above design one doesn't have to assume what is small or what it large
job size and system gets tested according to user's expectations such as
50% CPUs are busy on 80% disk size or 100% CPUs are busy with 50% of
disk size.