Re: [PATCH v3] PCI: Align pci memory space base address with page size

From: bibo, mao
Date: Tue Jun 06 2023 - 21:43:30 EST




在 2023/6/7 04:45, Bjorn Helgaas 写道:
> On Tue, Jun 06, 2023 at 06:06:04PM +0800, bibo, mao wrote:
>> Huacai,
>>
>> Although I post this patch, I think this should be arch specified
>> rather than general problem. X86 has solved this problem, arm64
>> with 64K page size is not popular. However LoongArch has this
>> problem, page size is 16K rather than 4K. It is the problem of
>> LoongArch system rather than generic issue.
>
> I think this is only related to the page size, not to any
> LoongArch-specific details. If that's the case, I don't think the
> change should be arch-specific.
>
>> There is such discussion before:
>> https://patchwork.kernel.org/project/linux-pci/patch/22400b8828ad44ddbccb876cc5ca3b11@xxxxxxxxxxxxxxxxxxxxxxx/#19319457
>>
>> UEFI bios sets pci memory space 4K aligned, however Loongarch kernel rescans the pci
>> bus and reassigns pci memory resource. So it it strange like this, here is pci memory info on
>> my 7A2000 board.
>> root@user-pc:~# lspci -vvv | grep Region
>> Region 5: Memory at e003526e800 (32-bit, non-prefetchable) [size=1K]
>> Region 0: Memory at e003526ec00 (64-bit, non-prefetchable) [size=1K]
>
> I guess these BARs are from different devices, and the problem you're
> pointing out is that both BARs end up in the same 16K page (actually
> even the same 4K page):
>
> (gdb) p/x 0xe003526e800 >> 12
> $1 = 0xe003526e
> (gdb) p/x 0xe003526ec00 >> 12
> $2 = 0xe003526e
>
> I agree that's a potential issue because a user mmap of the first
> device also gets access to the BAR of the second device. But it
> doesn't seem to be a *new* or LoongArch-specific issue.
>
> So I *think* the point of this patch is to ensure that BARs of
> different devices never share a page. Why is that suddenly an issue
> for LoongArch? Is it only because you see more sharing with 16K pages
> than other arches do with 4K pages?
The UEFI BIOS has assigned the PCI device with minimal 4K aligned, I guest
Linux kernel on X86/ARM uses resource directly rather than reassign resource
again. So there is problem for hot-plug devices, however most hot-plug devices
are used for server system and they are high speed devices, PCI resource size
is larger than 4K. So it is not obvious on x86/ARM system.

However on LoongArch system with page size 16K, the problem is obvious since
pci devices with 4K resource size are popular.
>
>> 在 2023/6/6 16:45, Bibo Mao 写道:
>>> Some PCI devices have only 4K memory space size, it is normal in general
>>> machines and aligned with page size. However some architectures which
>>> support different page size, default page size on LoongArch is 16K, and
>>> ARM64 supports page size varying from 4K to 64K. On machines where larger
>>> page size is use, memory space region of two different pci devices may be
>>> in one page. It is not safe with mmu protection, also VFIO pci device
>>> driver requires base address of pci memory space page aligned, so that it
>>> can be memory mapped to qemu user space when it is passed-through to vm.
>
> The minimum memory BAR per spec is 128 bytes (not 4K). You just
> demonstrated that even with 4K pages, different devices can share a
> page.
>
>>> It consumes more pci memory resource with page size alignment requirement,
>>> it should not be a problem on 64 bit system.
>>>
>>> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
>>> ---
>>> drivers/pci/setup-res.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>>> index 967f9a758923..55440ae0128d 100644
>>> --- a/drivers/pci/setup-res.c
>>> +++ b/drivers/pci/setup-res.c
>>> @@ -339,6 +339,14 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
>>> return -EINVAL;
>>> }
>>>
>>> +#ifdef CONFIG_64BIT
>>> + /*
>>> + * force minimum page alignment for vfio pci usage
>>> + * supposing there is enough pci memory resource on 64bit system
>>> + */
>>> + if (res->flags & IORESOURCE_MEM)
>>> + align = max_t(resource_size_t, PAGE_SIZE, align);
>>> +#endif
>
> Why is this under CONFIG_64BIT? That doesn't have anything to do with
> the BAR size. The only reason I can see is that with CONFIG_64BIT=y,
> we *might* have more MMIO space to use for BARs.
yes, I assume that there is enough PCI memory resource with 64 bit system,
after all it consumes more memory resource and brings out negative effect.
For UIO and SRIOV requirements, they are most 64-bit server system, they
can suffer from wasting some PCI memory resource.

Regards
Bibo, Mao
>
>>> size = resource_size(res);
>>> ret = _pci_assign_resource(dev, resno, size, align);