Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()

From: Akinobu Mita
Date: Mon Sep 29 2014 - 09:21:33 EST


2014-09-29 5:45 GMT+09:00 Chuck Ebbert <cebbert.lkml@xxxxxxxxx>:
> On Mon, 29 Sep 2014 00:52:03 +0900
> Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>
>> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
>> allocate memory region by dma_alloc_from_contiguous() before trying to
>> use alloc_pages().
>>
>> This wastes CMA region by small DMA-coherent buffers which can be
>> allocated by alloc_pages(). And it also causes performance degradation,
>> as this is trying to drive _all_ dma mapping allocations through a
>> _very_ small window, reported by Peter Hurley.
>>
>> This fixes it by trying to allocate by alloc_pages() first in
>> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
>> called only for huge allocation.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>> Reported-by: Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>
>> Cc: Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>
>> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>> Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
>> Cc: Don Dutile <ddutile@xxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
>> Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
>> Cc: x86@xxxxxxxxxx
>> Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
>> ---
>> arch/x86/kernel/pci-dma.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index a25e202..0402266 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>>
>> flag &= ~__GFP_ZERO;
>> again:
>> - page = NULL;
>> + page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
>> + get_order(size));
>
> Only try small allocs here, like when order < PAGE_ALLOC_COSTLY_ORDER ?
>
>> /* CMA can be used only in the context which permits sleeping */
>> - if (flag & __GFP_WAIT) {
>> + if (!page && (flag & __GFP_WAIT)) {
>> page = dma_alloc_from_contiguous(dev, count, get_order(size));
>> if (page && page_to_phys(page) + size > dma_mask) {
>> dma_release_from_contiguous(dev, page, count);
>> page = NULL;
>> }
>> }
>> - /* fallback */
>> - if (!page)
>> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
>
> (I forgot to add this in my first reply). I think it should try for a
> small alloc without CMA first, then try CMA, and then this final
> fallback for larger allocs.

I'm concerned with the performance problem reported by Peter Hurley.
This could be a solution, but I would like to hear Peter's opinion.

For now, I prefer the solution by this patch because it gives less
impact on CONFIG_DMA_CMA enabled. But it can be improved later on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/