On Thu, Dec 10, 2020 at 02:23:09AM +0800, John Garry wrote:
Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.
To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.
At a certain point, a depot may become full, and also some CPU rcaches may
also be full when inserting another IOVA is attempted. For this scenario,
currently the "loaded" CPU rcache is freed and a new one is created. This
freeing means that many IOVAs in the RB tree need to be freed, which
makes IO throughput performance fall off a cliff in some storage scenarios:
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]
And continue in this fashion, without recovering. Note that in this
example it was required to wait 16 hours for this to occur. Also note that
IO throughput also becomes gradually becomes more unstable leading up to
this point.
This problem is only seen for non-strict mode. For strict mode, the rcaches
stay quite compact.
It would be good to understand why the rcache doesn't stabilize. Could be
a bug, or just need some tuning
In strict mode, if a driver does Alloc-Free-Alloc and the first alloc
misses the rcache, the second allocation hits it. The same sequence in
non-strict mode misses the cache twice, because the IOVA is added to the
flush queue on Free.
So rather than AFAFAF.. we get AAA..FFF.., only once the fq_timer triggers
or the FQ is full. Interestingly the FQ size is 2x IOVA_MAG_SIZE, so we
could allocate 2 magazines worth of fresh IOVAs before alloc starts
hitting the cache. If a job allocates more than that, some magazines are
going to the depot, and with multi-CPU jobs those will get used on other
CPUs during the next alloc bursts, causing the progressive increase in
rcache consumption. I wonder if setting IOVA_MAG_SIZE > IOVA_FQ_SIZE helps
reuse of IOVAs?
Then again I haven't worked out the details, might be entirely wrong. I'll
have another look next week.
Thanks,
Jean
As a solution to this issue, judge that the IOVA caches have grown too big
when cached magazines need to be free, and just flush all the CPUs rcaches
instead.
The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.
In future, some IOVA compaction could be implemented to solve the
instability issue, which I figure could be quite complex to implement.
[0] https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@xxxxxxxxxx/
Analyzed-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
Reported-by: Xiang Chen <chenxiang66@xxxxxxxxxxxxx>
Tested-by: Xiang Chen <chenxiang66@xxxxxxxxxxxxx>
Signed-off-by: John Garry <john.garry@xxxxxxxxxx>
Reviewed-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
---
drivers/iommu/iova.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 732ee687e0e2..39b7488de8bb 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -841,7 +841,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
struct iova_rcache *rcache,
unsigned long iova_pfn)
{
- struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
bool can_insert = false;
unsigned long flags;
@@ -863,13 +862,12 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (cpu_rcache->loaded)
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
- } else {
- mag_to_free = cpu_rcache->loaded;
+ can_insert = true;
+ cpu_rcache->loaded = new_mag;
}
spin_unlock(&rcache->lock);
-
- cpu_rcache->loaded = new_mag;
- can_insert = true;
+ if (!can_insert)
+ iova_magazine_free(new_mag);
}
}
@@ -878,10 +876,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
spin_unlock_irqrestore(&cpu_rcache->lock, flags);
- if (mag_to_free) {
- iova_magazine_free_pfns(mag_to_free, iovad);
- iova_magazine_free(mag_to_free);
- }
+ if (!can_insert)
+ free_all_cpu_cached_iovas(iovad);
return can_insert;
}
--
2.26.2
_______________________________________________
iommu mailing list
iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/iommu