Re: [PATCH] iommu/arm-smmu-v3: Add a threshold to avoid potential soft lockup

From: Jason Gunthorpe
Date: Thu Jan 18 2024 - 09:23:21 EST


On Thu, Jan 18, 2024 at 07:21:10PM +0800, zhangzekun (A) wrote:
> > What is the rest of the call chain? How did you get into such a large
> > invalidation?
> we are doing some test with qemu virtual machines. The testing platform has
> 96 cores and 960GB memories. It will enable 64 virtual machines and every
> virtual machine will cost 8G memories. When batching closing these 64
> virtaul machine concurrently, we will get soft lockup.
>
>  WARN: soft lockup - CPU#18 stuck for 11s! [worker:3219]
> ....
> Call trace:
>  dump_backtrace+0x0/0x200
>  show_stack+0x20/0x30
>  dump_stack+0xf0/0x138
>  watchdog_print_info+0x48/0x54
>  watchdog_process_before_softlockup+0x9c/0xa0
>  watchdog_timer_fn+0x1ac/0x2f0
>  __run_hrtimer+0x98/0x2b4
>  __hrtimer_run_queues+0xc0/0x13c
>  hrtimer_interrupt+0x150/0x3e4
>  arch_timer_handler_phys+0x3c/0x50
>  handle_percpu_devid_irq+0x90/0x1f4
>  __handle_domain_irq+0x84/0xfc
>  gic_handle_irq+0x88/0x2b0
>  el1_irq+0xb8/0x140
>  ktime_get+0x3c/0xac
>  arm_smmu_cmdq_poll_until_not_full+0xb0/0x1a0
>  arm_smmu_cmdq_issue_cmdlist+0x194/0x5f4
>  __arm_smmu_tlb_inv_range+0x114/0x22c
>  arm_smmu_tlb_inv_walk+0x88/0x120
>  __arm_lpae_unmap+0x188/0x2c0
>  __arm_lpae_unmap+0x104/0x2c0
>  arm_lpae_unmap+0x68/0x80
>  arm_smmu_unmap+0x24/0x40
>  __iommu_unmap+0xd8/0x210
>  iommu_unmap_fast+0x18/0x24
>  unmap_unpin_fast+0x7c/0x140 [vfio_iommu_type1]
>  vfio_unmap_unpin+0x14c/0x3ac [vfio_iommu_type1]
>  vfio_remove_dma+0x38/0x124 [vfio_iommu_type1]
>  vfio_iommu_type1_detach_group+0x4b8/0x4e0 [vfio_iommu_type1]
>  __vfio_group_unset_container+0x58/0x18c [vfio]
>  vfio_group_try_dissolve_container+0x80/0x94 [vfio]
>  vfio_group_put_external_user+0x20/0x54 [vfio]
>  kvm_vfio_destroy+0xa8/0x12c
>  kvm_destroy_vm+0x20c/0x300
>  kvm_put_kvm+0x74/0xa0
>  kvm_vcpu_release+0x20/0x30
>  __fput+0xc4/0x270
>  ____fput+0x18/0x24
>  task_work_run+0xd0/0x1a0
>  do_exit+0x1d8/0x480
>  do_group_exit+0x40/0x130
>  get_signal+0x1f8/0x744
>  do_signal+0x98/0x204
>  do_notify_resume+0x15c/0x1e0
>  work_pending+0xc/0xa4

I see, that is interesting.

I think iommufd is close to substantialy improving this. It already
has the domain destruction ordering:
- Remove all domain attachments
- Read back phys addrs, unmap and unpin
- iommu_domain_free()

Currently smmu continues to issue ASID invalidations for the domain
even though there no references to it. However this is pretty
pointless here as we are going to free the ASID right away.

I'm going to make a series to allow smmu to support multi-instances on
a single domain. In this case when the domain looses all its
attachments it looses all its instances too and it won't have to do
*anything* for invalidation in this workflow.

Ie your flow will boil down to a single ASID invalidation once the
domain attachments are all removed, then on invalidations during unmap
and no invaludation at free. This would be a significantly faster
teardown I suspect.

> Thanks for your patch, the patch described below make sense to me. After
> spliting TLB invalidate operations according to domain stage,
> arm_smmu_tlb_inv_*_s1() can be used both in SVA codes and original smmu
> codes, and can implement arm_smmu_tlb_inv_all_s1() just using
> arm_smmu_tlb_inv_range_s1() by passing a special size. But the current
> smmu-v3 driver has not move "asid" into "struct smmu_domain", we still need
> to pass the exact asid from SVA.

Yes, it is not intended to be applied on the current kernel, there are
a lot of structural problems with how SVA works there..

Jason