Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping

From: Laura Abbott
Date: Fri Oct 12 2018 - 13:51:49 EST


On 10/10/2018 04:33 PM, John Stultz wrote:
Since 4.12, much later narrowed down to commit 2a55e7b5e544
("staging: android: ion: Call dma_map_sg for syncing and mapping"),
we have seen graphics performance issues on the HiKey960.

This was initially confounded by the fact that the out-of-tree
DRM driver was using HiSi custom ION heap which broke with the
4.12 ION abi changes, so there was lots of suspicion that the
performance problems were due to switching to a somewhat simple
cma based DRM driver for HiKey960. Additionally, as no
performance regression was seen w/ the original HiKey board
(which is SMP, not big.LITTLE as w/ HiKey960), there was some
thought that the out-of-tree EAS code wasn't quite optimized.

But after chasing a number of other leads, I found that
reverting the ION code to 4.11-era got the majority of the
graphics performance back (there may yet be further EAS tweaks
needed), which lead me to the dma_map_sg change.

In talking w/ Laura and Liam, it was suspected that the extra
cache operations were causing the trouble. Additionally, I found
that part of the reason we didn't see this w/ the original
HiKey board is that its (proprietary blob) GL code uses ion_mmap
and ion_map_dma_buf is called very rarely, where as with
HiKey960, the (also proprietary blob) GL code calls
ion_map_dma_buf much more frequently via the kernel driver.

Anyway, with the cause of the performance regression isolated,
I've tried to find a way to improve the performance of the
current code.

This approach, which I've mostly copied from the drm_prime
implementation is to try to track the direction we're mapping
the buffers so we can avoid calling dma_map/unmap_sg on every
ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
the work in attach/detach paths.

I'm not 100% sure of the correctness here, so close review would
be good, but it gets the performance back to being similar to
reverting the ION code to the 4.11-era.

Feedback would be greatly appreciated!

Cc: Beata Michalska <Beata.Michalska@xxxxxxx>
Cc: Matt Szczesiak <matt.szczesiak@xxxxxxx>
Cc: Anders Pedersen <Anders.Pedersen@xxxxxxx>
Cc: John Reitan <John.Reitan@xxxxxxx>
Cc: Liam Mark <lmark@xxxxxxxxxxxxxx>
Cc: Laura Abbott <labbott@xxxxxxxxxx>
Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Todd Kjos <tkjos@xxxxxxxxxxx>
Cc: Martijn Coenen <maco@xxxxxxxxxxx>
Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
---
drivers/staging/android/ion/ion.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index 9907332..a4d7fca 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
struct device *dev;
struct sg_table *table;
struct list_head list;
+ enum dma_data_direction dir;
};
static int ion_dma_buf_attach(struct dma_buf *dmabuf,
@@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
a->table = table;
a->dev = attachment->dev;
+ a->dir = DMA_NONE;
INIT_LIST_HEAD(&a->list);
attachment->priv = a;
@@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
{
struct ion_dma_buf_attachment *a = attachment->priv;
struct ion_buffer *buffer = dmabuf->priv;
+ struct sg_table *table;
+
+ if (!a)
+ return;
+
+ table = a->table;
+ if (table) {
+ if (a->dir != DMA_NONE)
+ dma_unmap_sg(attachment->dev, table->sgl, table->nents,
+ a->dir);
+ sg_free_table(table);
+ }
free_duped_table(a->table);
mutex_lock(&buffer->lock);
@@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
mutex_unlock(&buffer->lock);
kfree(a);
+ attachment->priv = NULL;
}
static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
@@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
struct ion_dma_buf_attachment *a = attachment->priv;
struct sg_table *table;
- table = a->table;
+ if (WARN_ON(direction == DMA_NONE || !a))
+ return ERR_PTR(-EINVAL);
- if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
- direction))
- return ERR_PTR(-ENOMEM);
+ if (a->dir == direction)
+ return a->table;
+ if (WARN_ON(a->dir != DMA_NONE))
+ return ERR_PTR(-EBUSY);
+
+ table = a->table;
+ if (!IS_ERR(table)) {
+ if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
+ direction)) {
+ table = ERR_PTR(-ENOMEM);
+ } else {
+ a->dir = direction;
+ }
+ }
return table;
}
@@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
{
- dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);

This changes the semantics so that the only time a buffer
gets unmapped is on detach. I don't think we want to restrict
Ion to that behavior but I also don't know if anyone else
is relying on that. I thought there might have been some Qualcomm
stuff that did that (Liam? Todd?)

I suspect most of the cost of the dma_map/dma_unmap is from the
cache flushing and not the actual mapping operations. If this
is the case, another option might be to figure out how to
incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
to decide when they actually want to sync.

Thanks,
Laura

}
static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)