Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

From: John Garry
Date: Thu Feb 25 2021 - 08:56:55 EST


On 29/01/2021 12:03, Robin Murphy wrote:
On 2021-01-29 09:48, Leizhen (ThunderTown) wrote:

Currently, we are thinking about the solution to the problem. However, because the end time of v5.11 is approaching, this patch is sent first.

However, that commit was made for a reason - how do we justify that one thing being slow is more important than another thing being completely broken? It's not practical to just keep doing the patch hokey-cokey based on whoever shouts loudest :(

On 2021/1/29 17:21, Zhen Lei wrote:
This reverts commit 4e89dce725213d3d0b0475211b500eda4ef4bf2f.

We find that this patch has a great impact on performance. According to
our test: the iops decreases from 1655.6K to 893.5K, about half.

Hardware: 1 SAS expander with 12 SAS SSD
Command:  Only the main parameters are listed.
           fio bs=4k rw=read iodepth=128 cpus_allowed=0-127

FWIW, I'm 99% sure that what you really want is [1], but then you get to battle against an unknown quantity of dodgy firmware instead.


Something which has not been said before is that this only happens for strict mode.

Anyway, we see ~50% throughput regression, which is intolerable. As seen in [0], I put this down to the fact that we have so many IOVA requests which exceed the rcache size limit, which means many RB tree accesses for non-cacheble IOVAs, which are now slower.

On another point, as for longterm IOVA aging issue, it seems that there is no conclusion there. However I did mention the issue of IOVA sizes exceeding rcache size for that issue, so maybe we can find a common solution. Similar to a fixed rcache depot size, it seems that having a fixed rcache max size range value (at 6) doesn't scale either.

As for 4e89dce72521, so even if it's proper to retry for a failed alloc, it is not always necessary. I mean, if we're limiting ourselves to 32b subspace for this SAC trick and we fail the alloc, then we can try the space above 32b first (if usable). If that fails, then retry there. I don't see a need to retry the 32b subspace if we're not limited to it. How about it? We tried that idea and it looks to just about restore performance.

Thanks,
John

[0] https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test