Re: [PATCH 0/5] Optimize iommu_map_sg() performance

From: Sai Prakash Ranjan
Date: Mon Jan 11 2021 - 01:24:27 EST


Hi Isaac,

On 2021-01-09 07:20, Isaac J. Manjarres wrote:
> The iommu_map_sg() code currently iterates through the given
> scatter-gather list, and in the worst case, invokes iommu_map()
> for each element in the scatter-gather list, which calls into
> the IOMMU driver through an indirect call. For an IOMMU driver
> that uses a format supported by the io-pgtable code, the IOMMU
> driver will then call into the io-pgtable code to map the chunk.
>
> Jumping between the IOMMU core code, the IOMMU driver, and the
> io-pgtable code and back for each element in a scatter-gather list
> is not efficient.
>
> Instead, add a map_sg() hook in both the IOMMU driver ops and the
> io-pgtable ops. iommu_map_sg() can then call into the IOMMU driver's
> map_sg() hook with the entire scatter-gather list, which can call
> into the io-pgtable map_sg() hook, which can process the entire
> scatter-gather list, signficantly reducing the number of indirect
> calls, and jumps between these layers, boosting performance.
>
> On a system that uses the ARM SMMU driver, and the ARM LPAE format,
> the current implementation of iommu_map_sg() yields the following
> latencies for mapping scatter-gather lists of various sizes. These
> latencies are calculated by repeating the mapping operation 10 times:
>
> size iommu_map_sg latency
> 4K 0.624 us
> 64K 9.468 us
> 1M 122.557 us
> 2M 239.807 us
> 12M 1435.979 us
> 24M 2884.968 us
> 32M 3832.979 us
>
> On the same system, the proposed modifications yield the following
> results:
>
> size iommu_map_sg latency
> 4K 3.645 us
> 64K 4.198 us
> 1M 11.010 us
> 2M 17.125 us
> 12M 82.416 us
> 24M 158.677 us
> 32M 210.468 us
>
> The procedure for collecting the iommu_map_sg latencies is
> the same in both experiments. Clearly, reducing the jumps
> between the different layers in the IOMMU code offers a
> signficant performance boost in iommu_map_sg() latency.
>

I gave this series a go on chromebook and saw these warnings
and several device probe failures, logs attached below:

WARN corresponds to this code in arm_lpae_map_by_pgsize()

if (WARN_ON(iaext || (paddr + size) >> cfg->oas))
return -ERANGE;

Logs:

[ 2.411391] ------------[ cut here ]------------
[ 2.416149] WARNING: CPU: 6 PID: 56 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[ 2.425606] Modules linked in:
[ 2.428749] CPU: 6 PID: 56 Comm: kworker/6:1 Not tainted 5.10.5 #970
[ 2.440287] Workqueue: events deferred_probe_work_func
[ 2.445563] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[ 2.451726] pc : arm_lpae_map_sg+0x234/0x248
[ 2.456112] lr : arm_lpae_map_sg+0xe0/0x248
[ 2.460410] sp : ffffffc010513750
[ 2.463820] x29: ffffffc010513790 x28: ffffffb943332000
[ 2.469281] x27: 0000000ffffff000 x26: ffffffb943d14900
[ 2.474738] x25: 0000000000001000 x24: 0000000103465000
[ 2.480196] x23: 0000000000000001 x22: 0000000103466000
[ 2.485645] x21: 0000000000000003 x20: 0000000000000a20
[ 2.491103] x19: ffffffc010513850 x18: 0000000000000001
[ 2.496562] x17: 0000000000000002 x16: 00000000ffffffff
[ 2.502021] x15: 0000000000000000 x14: 0000000000000000
[ 2.507479] x13: 0000000000000001 x12: 0000000000000000
[ 2.512928] x11: 0000001000000000 x10: 0000000000000000
[ 2.518385] x9 : 0000000000000001 x8 : 0000000040201000
[ 2.523844] x7 : 0000000000000a20 x6 : ffffffb943463000
[ 2.529302] x5 : 0000000000000003 x4 : 0000000000001000
[ 2.534760] x3 : 0000000000000001 x2 : ffffffb941f605a0
[ 2.540219] x1 : 0000000000000003 x0 : 0000000000000e40
[ 2.545679] Call trace:
[ 2.548196] arm_lpae_map_sg+0x234/0x248
[ 2.552225] arm_smmu_map_sg+0x80/0xc4
[ 2.556078] __iommu_map_sg+0x6c/0x188
[ 2.559931] iommu_map_sg_atomic+0x18/0x20
[ 2.564144] iommu_dma_alloc_remap+0x26c/0x34c
[ 2.568703] iommu_dma_alloc+0x9c/0x268
[ 2.572647] dma_alloc_attrs+0x88/0xfc
[ 2.576503] gsi_ring_alloc+0x50/0x144
[ 2.580356] gsi_init+0x2c4/0x5c4
[ 2.583766] ipa_probe+0x14c/0x2b4
[ 2.587263] platform_drv_probe+0x94/0xb4
[ 2.591377] really_probe+0x138/0x348
[ 2.595145] driver_probe_device+0x80/0xb8
[ 2.599358] __device_attach_driver+0x90/0xa8
[ 2.603829] bus_for_each_drv+0x84/0xcc
[ 2.607772] __device_attach+0xc0/0x148
[ 2.611713] device_initial_probe+0x18/0x20
[ 2.616012] bus_probe_device+0x38/0x94
[ 2.619953] deferred_probe_work_func+0x78/0xb0
[ 2.624611] process_one_work+0x210/0x3dc
[ 2.628726] worker_thread+0x284/0x3e0
[ 2.632578] kthread+0x148/0x1a8
[ 2.635891] ret_from_fork+0x10/0x18
[ 2.639562] ---[ end trace 9bac18cad6a9862e ]---
[ 2.644414] ipa 1e40000.ipa: error -12 allocating channel 0 event ring
[ 2.651656] ipa: probe of 1e40000.ipa failed with error -12
[ 2.660072] dwc3 a600000.dwc3: Adding to iommu group 8
[ 2.668632] xhci-hcd xhci-hcd.13.auto: xHCI Host Controller
[ 2.674680] xhci-hcd xhci-hcd.13.auto: new USB bus registered, assigned bus number 1
[ 2.683638] ------------[ cut here ]------------
[ 2.688391] WARNING: CPU: 6 PID: 56 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[ 2.697846] Modules linked in:
[ 2.700988] CPU: 6 PID: 56 Comm: kworker/6:1 Tainted: G W 5.10.5 #970
[ 2.713954] Workqueue: events deferred_probe_work_func
[ 2.719228] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[ 2.725390] pc : arm_lpae_map_sg+0x234/0x248
[ 2.729775] lr : arm_lpae_map_sg+0xe0/0x248
[ 2.734073] sp : ffffffc010512e20
[ 2.737483] x29: ffffffc010512e60 x28: ffffffb94345e000
[ 2.742942] x27: 0000000ffffff000 x26: ffffffb941fa1500
[ 2.748400] x25: 0000000000001000 x24: 0000000103468000
[ 2.753858] x23: 0000000000000001 x22: 0000000103469000
[ 2.759318] x21: 0000000000000003 x20: 0000000000000a20
[ 2.764777] x19: ffffffc010512f20 x18: 0000000000000001
[ 2.770235] x17: 0000000000000001 x16: 00000000ffffffff
[ 2.775694] x15: 0000000000000000 x14: 0000000000000001
[ 2.781154] x13: 0000000000000001 x12: 0000000000000000
[ 2.786613] x11: 0000001000000000 x10: 0000000000000000
[ 2.792071] x9 : 0000000000000001 x8 : 0000000040201000
[ 2.797529] x7 : 0000000000000000 x6 : ffffffc010512f20
[ 2.802988] x5 : 0000000000000a20 x4 : 0000000000001000
[ 2.808438] x3 : 0000000000000001 x2 : ffffffb9457d6100
[ 2.813896] x1 : 0000000000000003 x0 : 0000000000000e40
[ 2.819356] Call trace:
[ 2.821878] arm_lpae_map_sg+0x234/0x248
[ 2.825907] arm_smmu_map_sg+0x80/0xc4
[ 2.829761] __iommu_map_sg+0x6c/0x188
[ 2.833615] iommu_map_sg_atomic+0x18/0x20
[ 2.837827] iommu_dma_alloc_remap+0x26c/0x34c
[ 2.842386] iommu_dma_alloc+0x9c/0x268
[ 2.846329] dma_alloc_attrs+0x88/0xfc
[ 2.850184] xhci_mem_init+0x200/0x930
[ 2.854037] xhci_init+0xc0/0xec
[ 2.857350] xhci_gen_setup+0x270/0x348
[ 2.861292] xhci_plat_setup+0x4c/0x58
[ 2.865146] usb_add_hcd+0x288/0x430
[ 2.868815] xhci_plat_probe+0x3f8/0x568
[ 2.872844] platform_drv_probe+0x94/0xb4
[ 2.876957] really_probe+0x138/0x348
[ 2.880725] driver_probe_device+0x80/0xb8
[ 2.884936] __device_attach_driver+0x90/0xa8
[ 2.889407] bus_for_each_drv+0x84/0xcc
[ 2.893348] __device_attach+0xc0/0x148
[ 2.897289] device_initial_probe+0x18/0x20
[ 2.901587] bus_probe_device+0x38/0x94
[ 2.905529] device_add+0x214/0x3c4
[ 2.909112] platform_device_add+0x198/0x208
[ 2.913497] dwc3_host_init+0x228/0x2bc
[ 2.917437] dwc3_core_init_mode+0xfc/0x18c
[ 2.921735] dwc3_probe+0x978/0xac8
[ 2.925318] platform_drv_probe+0x94/0xb4
[ 2.929432] really_probe+0x138/0x348
[ 2.933200] driver_probe_device+0x80/0xb8
[ 2.937411] __device_attach_driver+0x90/0xa8
[ 2.941881] bus_for_each_drv+0x84/0xcc
[ 2.945823] __device_attach+0xc0/0x148
[ 2.949765] device_initial_probe+0x18/0x20
[ 2.954064] bus_probe_device+0x38/0x94
[ 2.958005] device_add+0x214/0x3c4
[ 2.961590] of_device_add+0x3c/0x48
[ 2.965260] of_platform_device_create_pdata+0xac/0xec
[ 2.970535] of_platform_bus_create+0x1cc/0x348
[ 2.975191] of_platform_populate+0x78/0xc8
[ 2.979490] dwc3_qcom_probe+0x4e0/0xa88
[ 2.983518] platform_drv_probe+0x94/0xb4
[ 2.987632] really_probe+0x138/0x348
[ 2.991400] driver_probe_device+0x80/0xb8
[ 2.995612] __device_attach_driver+0x90/0xa8
[ 3.000084] bus_for_each_drv+0x84/0xcc
[ 3.004025] __device_attach+0xc0/0x148
[ 3.007966] device_initial_probe+0x18/0x20
[ 3.012264] bus_probe_device+0x38/0x94
[ 3.016206] deferred_probe_work_func+0x78/0xb0
[ 3.020864] process_one_work+0x210/0x3dc
[ 3.024979] worker_thread+0x284/0x3e0
[ 3.028833] kthread+0x148/0x1a8
[ 3.032147] ret_from_fork+0x10/0x18
[ 3.035817] ---[ end trace 9bac18cad6a9862f ]---
[ 3.041583] xhci-hcd xhci-hcd.13.auto: can't setup: -12
[ 3.046950] xhci-hcd xhci-hcd.13.auto: USB bus 1 deregistered
[ 3.053107] xhci-hcd: probe of xhci-hcd.13.auto failed with error -12
[ 3.062208] coresight etm0: CPU0: ETM v4.2 initialized
[ 3.063345] sdhci_msm 7c4000.sdhci: TCXO clk not present (-2)
[ 3.067862] coresight etm1: CPU1: ETM v4.2 initialized
[ 3.078829] ------------[ cut here ]------------
[ 3.079076] coresight etm2: CPU2: ETM v4.2 initialized
[ 3.083587] WARNING: CPU: 5 PID: 7 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[ 3.083589] Modules linked in:
[ 3.089200] coresight etm3: CPU3: ETM v4.2 initialized
[ 3.098235]
[ 3.098241] CPU: 5 PID: 7 Comm: kworker/u16:0 Tainted: G W 5.10.5 #970
[ 3.101722] coresight etm4: CPU4: ETM v4.2 initialized
[ 3.106672] Workqueue: events_unbound async_run_entry_fn
[ 3.131984] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[ 3.138161] pc : arm_lpae_map_sg+0x234/0x248
[ 3.142557] lr : arm_lpae_map_sg+0xe0/0x248
[ 3.146864] sp : ffffffc0100c3800
[ 3.150272] x29: ffffffc0100c3840 x28: ffffffb942394000
[ 3.155736] x27: 0000000ffffff000 x26: ffffffb944010100
[ 3.161198] x25: 0000000000001000 x24: 0000000102326000
[ 3.166660] x23: 0000000000000001 x22: 0000000102327000
[ 3.172122] x21: 0000000000000003 x20: 0000000000000a20
[ 3.177586] x19: ffffffc0100c3900 x18: 0000000000000001
[ 3.183048] x17: 0000000000000001 x16: 00000000ffffffff
[ 3.188512] x15: 0000000000000000 x14: 0000000000000001
[ 3.193973] x13: 0000000000000001 x12: 0000000000000000
[ 3.199434] x11: 0000001000000000 x10: 0000000000000000
[ 3.204895] x9 : 0000000000000001 x8 : 0000000040201000
[ 3.210356] x7 : 0000000000000000 x6 : ffffffc0100c3900
[ 3.215818] x5 : 0000000000000a20 x4 : 0000000000001000
[ 3.221279] x3 : 0000000000000001 x2 : ffffffb94581a100
[ 3.226741] x1 : 0000000000000003 x0 : 0000000000000e40
[ 3.232203] Call trace:
[ 3.234722] arm_lpae_map_sg+0x234/0x248
[ 3.238761] arm_smmu_map_sg+0x80/0xc4
[ 3.242615] __iommu_map_sg+0x6c/0x188
[ 3.246468] iommu_map_sg_atomic+0x18/0x20
[ 3.250680] iommu_dma_alloc_remap+0x26c/0x34c
[ 3.255250] iommu_dma_alloc+0x9c/0x268
[ 3.259204] dma_alloc_attrs+0x88/0xfc
[ 3.263058] sdhci_setup_host+0x250/0xc8c
[ 3.267185] sdhci_msm_cqe_add_host+0x38/0x188
[ 3.271756] sdhci_msm_probe+0x540/0x628
[ 3.275795] platform_drv_probe+0x94/0xb4
[ 3.279921] really_probe+0x138/0x348
[ 3.283687] driver_probe_device+0x80/0xb8
[ 3.287899] __device_attach_driver+0x90/0xa8
[ 3.292381] bus_for_each_drv+0x84/0xcc
[ 3.296332] __device_attach_async_helper+0x80/0xd4
[ 3.301347] async_run_entry_fn+0x4c/0x100
[ 3.305559] process_one_work+0x210/0x3dc
[ 3.309684] worker_thread+0x234/0x3e0
[ 3.313537] kthread+0x148/0x1a8
[ 3.316861] ret_from_fork+0x10/0x18
[ 3.320541] ---[ end trace 9bac18cad6a98630 ]---
[ 3.325372] mmc1: Unable to allocate ADMA buffers - falling back to standard DMA
<snip>...
[ 3.587535] mmc1: running CQE recovery
[ 3.591419] mmc1: Got command interrupt 0x00010001 even though no command operation was in progress.
[ 3.600796] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 3.607417] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00007202
[ 3.614038] mmc1: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000108

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation