Re: [PATCH] PCI: Release coalesced resource

From: Ross Lagerwall
Date: Fri Jun 09 2023 - 11:40:30 EST


> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Sent: Tuesday, June 6, 2023 11:36 PM
> To: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> Cc: linux-pci@xxxxxxxxxxxxxxx <linux-pci@xxxxxxxxxxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH] PCI: Release coalesced resource
>  
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
>
> On Thu, May 25, 2023 at 04:32:48PM +0100, Ross Lagerwall wrote:
> > When contiguous windows are coalesced, the resource is invalidated and
> > consequently not added to the bus. However, it remains in the resource
> > hierarchy:
> >
> > ...
> >   ef2fff00-ef2fffff : 0000:00:13.2
> >     ef2fff00-ef2fffff : ehci_hcd
> > 00000000-00000000 : PCI Bus 0000:00
> > f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
> >   f0000000-f3ffffff : Reserved
> > ...
>
> I assume the "00000000-00000000 : PCI Bus 0000:00" is the problematic
> part?  Is there anything in dmesg that shows the resources before they
> were coalesced?

Yes, that is the problematic part which gets removed by this patch.

dmesg doesn't show the resources before they were coalesced, but I
captured the output of /proc/iomem with/without the coalesce patch
to see what was being coalesced.

Without coalescing, this region ...

fec00000-fec7ffff : PCI Bus 0000:00
fec00000-fec003ff : IOAPIC 0
fec80000-fecbffff : PCI Bus 0000:00
fec80000-fec803ff : IOAPIC 1
fec90000-fec93fff : pnp 00:06

... gets coalesced into:

fec00000-fecbffff : PCI Bus 0000:00
fec00000-fec003ff : IOAPIC 0
fec80000-fec803ff : IOAPIC 1
fec90000-fec93fff : pnp 00:06

>
> Is there an error message we could include here to link the problem
> with the solution?

The error shows two "clipped" messages followed by a BUG when starting a VM under Xen.
Having said that, I don't think the error is specific to Xen - it just doesn't
handle getting back an unexpected resource range.

[ 2783.654292] clipped [mem 0x100000000-0x3fffffffffff] to [mem 0x230a07000-0x3fffffffffff] for e820 entry [mem 0x100000000-0x230a06fff]
[ 2783.654311] clipped [mem 0x230a07000-0x3fffffffffff] to [mem 0x10000000000-0x3fffffffffff] for e820 entry [mem 0xfd00000000-0xffffffffff]
[ 2783.710864] memmap_init_zone_device initialised 32768 pages in 0ms
[ 2783.711124] ------------[ cut here ]------------
[ 2783.711127] kernel BUG at arch/x86/xen/p2m.c:542!
[ 2783.711166] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 2783.711177] CPU: 1 PID: 1795 Comm: xenconsoled Not tainted 6.1.27+0 #1
[ 2783.711189] Hardware name: Dell Inc. PowerEdge R815/0272WF, BIOS 2.8.2 05/21/2012
[ 2783.711200] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930
[ 2783.711222] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff
[ 2783.711286] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246
[ 2783.711297] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff
[ 2783.711308] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067
[ 2783.711318] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000
[ 2783.711328] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8
[ 2783.711337] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000
[ 2783.711358] FS: 00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000
[ 2783.711370] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2783.711379] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660
[ 2783.711391] Call Trace:
[ 2783.711401] <TASK>
[ 2783.711412] xen_alloc_unpopulated_pages+0xa6/0x430
[ 2783.711429] gnttab_alloc_pages+0x11/0x50
[ 2783.711441] gntdev_alloc_map+0x1d2/0x2e0
[ 2783.711455] gntdev_ioctl+0x261/0x540
[ 2783.711466] __x64_sys_ioctl+0x8a/0xc0
[ 2783.711480] do_syscall_64+0x3b/0x90
[ 2783.711494] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 2783.711506] RIP: 0033:0x7f2956a875d7
[ 2783.711516] Code: 44 00 00 48 8b 05 b9 08 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 08 2d 00 f7 d8 64 89 01 48
[ 2783.711536] RSP: 002b:00007ffcd8292e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ 2783.711549] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f2956a875d7
[ 2783.711559] RDX: 00007ffcd8292e90 RSI: 0000000000184700 RDI: 000000000000000c
[ 2783.711569] RBP: 00007ffcd8292f30 R08: 00007ffcd8292f5c R09: 00007ffcd8292f58
[ 2783.711579] R10: 00007ffcd82928e0 R11: 0000000000000206 R12: 00007ffcd8292e90
[ 2783.711589] R13: 0000000000000003 R14: 000000000000000c R15: 0000000000000001
[ 2783.711602] </TASK>
[ 2783.711608] Modules linked in: arptable_filter arp_tables tcp_diag udp_diag raw_diag inet_diag netlink_diag ebtable_filter ebtables nfsv3 nfs_acl nfs lockd grace fscache netfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_conncount nf_nat 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter dm_multipath sunrpc dm_mod crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd ipmi_si psmouse ipmi_devintf k10temp i2c_piix4 sg fam15h_power ipmi_msghandler acpi_power_meter xen_wdt ip_tables x_tables hid_generic usbhid hid sd_mod t10_pi sr_mod cdrom crc64_rocksoft crc64 ohci_pci ahci libahci ehci_pci serio_raw ixgbe ehci_hcd mdio libata ohci_hcd xfrm_algo megaraid_sas bnx2 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod scsi_common ipv6 crc_ccitt
[ 2783.711805] ---[ end trace 0000000000000000 ]---
[ 2783.716538] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930
[ 2783.716553] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff
[ 2783.716573] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246
[ 2783.716585] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff
[ 2783.716596] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067
[ 2783.716608] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000
[ 2783.716620] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8
[ 2783.716631] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000
[ 2783.716648] FS: 00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000
[ 2783.716661] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2783.716671] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660

This was separately reported here:

https://github.com/QubesOS/qubes-issues/issues/7918#issuecomment-1331763950

I have the dmesg and /proc/iomem logs here (somewhat older kernel):

https://pastebin.com/raw/8TQUp2uG

>
> > In some cases (e.g. the Xen scratch region), this causes future calls to
> > allocate_resource() to choose an inappropriate location which the caller
> > cannot handle. Fix by releasing the resource and removing from the
> > hierarchy.
> >
> > Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
>
> 7c3855c423b1 appeared in v5.16, so we may need a stable tag?

Yes, I think so.

Thanks,
Ross