Re: [PATCH v2 10/11] iommu/vt-d: Use xarray for global device_domain_info

From: Jason Gunthorpe
Date: Mon Feb 14 2022 - 09:00:48 EST


On Mon, Feb 14, 2022 at 10:57:03AM +0800, Lu Baolu wrote:
> Replace the existing global device_domain_list with an array so that it
> could be rapidly searched. The index of the array is composed by the PCI
> segment, bus and devfn. Use RCU for lock protection.
>
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> include/linux/intel-iommu.h | 1 -
> drivers/iommu/intel/iommu.c | 72 ++++++++++++++++++-------------------
> 2 files changed, 34 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 03f1134fc2fe..aca1c1cc04a8 100644
> +++ b/include/linux/intel-iommu.h
> @@ -610,7 +610,6 @@ struct intel_iommu {
> /* PCI domain-device relationship */
> struct device_domain_info {
> struct list_head link; /* link to domain siblings */
> - struct list_head global; /* link to global list */
> struct list_head table; /* link to pasid table */
> u32 segment; /* PCI segment number */
> u8 bus; /* PCI bus number */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index fb17ed8c08f3..ecec923ce191 100644
> +++ b/drivers/iommu/intel/iommu.c
> @@ -131,8 +131,6 @@ static struct intel_iommu **g_iommus;
>
> static void __init check_tylersburg_isoch(void);
> static int rwbf_quirk;
> -static inline struct device_domain_info *
> -dmar_search_domain_by_dev_info(int segment, int bus, int devfn);
>
> /*
> * set to 1 to panic kernel if can't successfully enable VT-d
> @@ -318,30 +316,34 @@ int intel_iommu_gfx_mapped;
> EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
>
> DEFINE_SPINLOCK(device_domain_lock);
> -static LIST_HEAD(device_domain_list);
> +static DEFINE_XARRAY_ALLOC(device_domain_array);

The 'device_domain_lock' should be replaced by the internal xarray
spinlock, no reason to have two locks.

> +
> +/* Convert device source ID into the index of device_domain_array. */
> +static inline unsigned long devi_idx(unsigned long seg, u8 bus, u8 devfn)
> +{
> + return (seg << 16) | PCI_DEVID(bus, devfn);
> +}
>
> /*
> - * Iterate over elements in device_domain_list and call the specified
> + * Iterate over elements in device_domain_array and call the specified
> * callback @fn against each element.
> */
> int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> void *data), void *data)
> {
> - int ret = 0;
> - unsigned long flags;
> struct device_domain_info *info;
> + unsigned long index;
> + int ret = 0;
>
> - spin_lock_irqsave(&device_domain_lock, flags);
> - list_for_each_entry(info, &device_domain_list, global) {
> + rcu_read_lock();
> + xa_for_each(&device_domain_array, index, info) {
> ret = fn(info, data);
> - if (ret) {
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> - return ret;
> - }
> + if (ret)
> + break;

And you probably shouldn't try to use RCU. It is really unclear how
this function can be useful while racing against
intel_iommu_release_device(), eg today the only user of this function
does:

static int search_pasid_table(struct device_domain_info *info, void *opaque)
{
struct pasid_table_opaque *data = opaque;

if (info->iommu->segment == data->segment &&
info->bus == data->bus &&
info->devfn == data->devfn &&

And even if you kfree_rcu(info) then 'info->iommu->' is still racy
unlocked.

RCU is complicated to use, it is not just a drop in replacement for a
spinlock.

Jason