Re: [PATCH v1 4/4] vfio/nvgpu: register device memory for poison handling

From: Naoya Horiguchi
Date: Tue Sep 26 2023 - 03:39:16 EST


On Wed, Sep 20, 2023 at 07:32:10PM +0530, ankita@xxxxxxxxxx wrote:
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
>
> The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
> (Qemu) using remap_pfn_range() without adding the memory to the kernel.
> The device memory pages are not backed by struct page. Patches 1-3
> implements the mechanism to handle ECC/poison on memory page without
> struct page and expose a registration function. This new mechanism is
> leveraged here.
>  
> The module registers its memory region with the kernel MM for ECC handling
> using the register_pfn_address_space() registration API exposed by the
> kernel. It also defines a failure callback function pfn_memory_failure()
> to get the poisoned PFN from the MM.
>  
> The module track poisoned PFN as a bitmap with a bit per PFN. The PFN is
> communicated by the kernel MM to the module through the failure function,
> which sets the appropriate bit in the bitmap.
>  
> The module also defines a VMA fault ops for the module. It returns
> VM_FAULT_HWPOISON in case the bit for the PFN is set in the bitmap.
>
> [1] https://lore.kernel.org/all/20230915025415.6762-1-ankita@xxxxxxxxxx/
>
> Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
> ---

...

> @@ -406,6 +494,19 @@ nvgrace_gpu_vfio_pci_fetch_memory_property(struct pci_dev *pdev,
>
> nvdev->memlength = memlength;
>
> +#ifdef CONFIG_MEMORY_FAILURE
> + /*
> + * A bitmap is maintained to track the pages that are poisoned. Each
> + * page is represented by a bit. Allocation size in bytes is
> + * determined by shifting the device memory size by PAGE_SHIFT to
> + * determine the number of pages; and further shifted by 3 as each
> + * byte could track 8 pages.
> + */
> + nvdev->pfn_bitmap
> + = vzalloc((nvdev->memlength >> PAGE_SHIFT)/BITS_PER_TYPE(char));
> + if (!nvdev->pfn_bitmap)
> + ret = -ENOMEM;
> +#endif
> return ret;
> }
>

I assume that memory failure is a relatively rare event (otherwise the device
is simply broken and it's better to stop using it), so the bitmap is mostly
full of zeros.
I think that the size of device memory is on the order of 100GB, then the
bitmap size is about 3.2MB, which might be not too large in modern systems,
but using other data structure with smaller memory footprint like hash table
can be more beneficial?

Thanks,
Naoya Horiguchi