Re: [PATCH v11 09/22] vfio iommu type1: Add task structure to vfio_dma

From: Kirti Wankhede
Date: Tue Nov 08 2016 - 09:17:13 EST




On 11/8/2016 2:33 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:43 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>

...

>> static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> struct vfio_iommu_type1_dma_map *map)
>> {
>> dma_addr_t iova = map->iova;
>> unsigned long vaddr = map->vaddr;
>> size_t size = map->size;
>> - long npage;
>> int ret = 0, prot = 0;
>> uint64_t mask;
>> struct vfio_dma *dma;
>> - unsigned long pfn;
>> + struct vfio_addr_space *addr_space;
>> + struct mm_struct *mm;
>> + bool free_addr_space_on_err = false;
>>
>> /* Verify that none of our __u64 fields overflow */
>> if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>> @@ -608,47 +685,56 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> mutex_lock(&iommu->lock);
>>
>> if (vfio_find_dma(iommu, iova, size)) {
>> - mutex_unlock(&iommu->lock);
>> - return -EEXIST;
>> + ret = -EEXIST;
>> + goto do_map_err;
>> + }
>> +
>> + mm = get_task_mm(current);
>> + if (!mm) {
>> + ret = -ENODEV;
>
> -EFAULT?
>

-ENODEV return is in original code from vfio_pin_pages()
if (!current->mm)
return -ENODEV;

Once I thought of changing it to -EFAULT, but then again changed to
-ENODEV to be consistent with original error code.

Should I still change this return to -EFAULT?


>> + goto do_map_err;
>> + }
>> +
>> + addr_space = vfio_find_addr_space(iommu, mm);
>> + if (addr_space) {
>> + atomic_inc(&addr_space->ref_count);
>> + mmput(mm);
>> + } else {
>> + addr_space = kzalloc(sizeof(*addr_space), GFP_KERNEL);
>> + if (!addr_space) {
>> + ret = -ENOMEM;
>> + goto do_map_err;
>> + }
>> + addr_space->mm = mm;
>> + atomic_set(&addr_space->ref_count, 1);
>> + list_add(&addr_space->next, &iommu->addr_space_list);
>> + free_addr_space_on_err = true;
>> }
>>
>> dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> if (!dma) {
>> - mutex_unlock(&iommu->lock);
>> - return -ENOMEM;
>> + if (free_addr_space_on_err) {
>> + mmput(mm);
>> + list_del(&addr_space->next);
>> + kfree(addr_space);
>> + }
>> + ret = -ENOMEM;
>> + goto do_map_err;
>> }
>>
>> dma->iova = iova;
>> dma->vaddr = vaddr;
>> dma->prot = prot;
>> + dma->addr_space = addr_space;
>> + get_task_struct(current);
>> + dma->task = current;
>> + dma->mlock_cap = capable(CAP_IPC_LOCK);
>
>
> How do you reason we can cache this? Does the fact that the process
> had this capability at the time that it did a DMA_MAP imply that it
> necessarily still has this capability when an external user (vendor
> driver) tries to pin pages? I don't see how we can make that
> assumption.
>
>

Will process change MEMLOCK limit at runtime? I think it shouldn't,
correct me if I'm wrong. QEMU doesn't do that, right?

The function capable() determines current task's capability. But when
vfio_pin_pages() is called, it could come from other task but pages are
pinned from address space of task who mapped it. So we can't use
capable() in vfio_pin_pages()

If this capability shouldn't be cached, we have to use has_capability()
with dma->task as argument in vfio_pin_pages()

bool has_capability(struct task_struct *t, int cap)

Thanks,
Kirti