Re: [PATCH v2] rtlwifi: rtl8723be: Disable ASPM if RTL8723BE connects to Intel PCI bridge

From: Jian-Hong Pan
Date: Wed Nov 15 2023 - 05:40:27 EST


Jonathan Bither <jonbither@xxxxxxxxx> 於 2023年11月14日 週二 下午11:01寫道:
>
>
> On 11/13/23 22:01, Jian-Hong Pan wrote:
> > Ping-Ke Shih <pkshih@xxxxxxxxxxx> 於 2023年11月14日 週二 上午9:41寫道:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jian-Hong Pan <jhp@xxxxxxxxxxxxx>
> >>> Sent: Monday, November 13, 2023 12:35 PM
> >>> To: Larry Finger <Larry.Finger@xxxxxxxxxxxx>; Ping-Ke Shih <pkshih@xxxxxxxxxxx>
> >>> Cc: Kalle Valo <kvalo@xxxxxxxxxx>; linux-wireless@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >>> linux@xxxxxxxxxxxxx; Jian-Hong Pan <jhp@xxxxxxxxxxxxx>
> >>> Subject: [PATCH v2] rtlwifi: rtl8723be: Disable ASPM if RTL8723BE connects to Intel PCI bridge
> >>>
> >>> Disable rtl8723be's ASPM if the Realtek RTL8723BE PCIe Wireless adapter
> >>> connects to some Intel PCI bridges, such as Skylake and Kabylake.
> >>> Otherwise, the PCI AER flood hangs system:
> >>>
> >>> pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> >>> pcieport 0000:00:1c.5: device [8086:9d15] error status/mask=00000001/00002000
> >>> pcieport 0000:00:1c.5: [ 0] RxErr (First)
> >>> pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> >>> pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> >>> pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> >>> pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> >>> pcieport 0000:00:1c.5: AER: Multiple Corrected error received: 0000:00:1c.5
> >>> pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> >>>
> >>> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=218127
> >> Seemingly, you can use "Link" or "Closes" tag.
> >>
> >>> Signed-off-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx>
> >> Acked-by: Ping-Ke Shih <pkshih@xxxxxxxxxxx>
> >>
> >>
> >>> ---
> >>> v2: Add the switch case's default condition with comment:
> >>> "The ASPM has already been enabled by initializing
> >>> rtl8723be_mod_params' aspm_support as 1."
> >>>
> >>> .../wireless/realtek/rtlwifi/rtl8723be/sw.c | 24 +++++++++++++++++++
> >>> 1 file changed, 24 insertions(+)
> >>>
> >>> diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
> >>> b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
> >>> index 43b611d5288d..b20c0b9d8393 100644
> >>> --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
> >>> +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
> >>> @@ -25,10 +25,34 @@ static void rtl8723be_init_aspm_vars(struct ieee80211_hw *hw)
> >>> {
> >>> struct rtl_priv *rtlpriv = rtl_priv(hw);
> >>> struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
> >>> + struct pci_dev *bridge_pdev;
> >>>
> >>> /*close ASPM for AMD defaultly */
> >>> rtlpci->const_amdpci_aspm = 0;
> >>>
> >>> + /* Disable ASPM if RTL8723BE connects to some Intel PCI bridges, such as
> >>> + * Skylake and Kabylake. Otherwise, the PCI AER flood hangs system.
> >>> + */
> >>> + bridge_pdev = rtlpci->pdev->bus->self;
> >>> + if (bridge_pdev->vendor == PCI_VENDOR_ID_INTEL) {
> >>> + switch(bridge_pdev->device) {
> >>> + case 0x9d15:
> >>> + /* PCI bridges on Skylake */
> >>> + case 0xa110 ... 0xa11f:
> >>> + case 0xa167 ... 0xa16a:
> >>> + /* PCI bridges on Kabylake */
> >>> + case 0xa290 ... 0xa29f:
> >>> + case 0xa2e7 ... 0xa2ee:
> >> Out of curiosity, do you have so many real platforms to test?
> > We have tested those platforms before, because of the hardware
> > enablement. They all have the same error, and the error bothers
> > people many years.
> > https://groups.google.com/g/fa.linux.kernel/c/0uz8Nr_NVOI
> >
> > However, most of them are returned back to the owner now. By
> > accident, we keep the ASUS X555UQ equipped with Intel i7-6500U CPU and
> > Realtek RTL8723BE PCIe Wireless adapter on hands for more test.
>
> The device matching that you're doing follows what was also done in
> commit 7184f5b451cf3dc61de79091d235b5d2bba2782d for an ACS quirk on the
> same chipsets.
>
> I'm just curious if the issue is a more universal Intel one and
> can/should be resolved with a pci quirk as opposed to inside an
> individual driver.

Interesting idea. I did some test like:

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
index 43b611d5288d..edb08247760c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c
@@ -26,6 +26,10 @@ static void rtl8723be_init_aspm_vars(struct ieee80211_hw *hw)
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));

+ /* Disable ASPM if the link control disables it */
+ if (!pcie_aspm_enabled(rtlpci->pdev))
+ rtlpriv->cfg->mod_params->aspm_support = 0;
+
/*close ASPM for AMD defaultly */
rtlpci->const_amdpci_aspm = 0;

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index eeec1d6f9023..239ae945df00 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3606,6 +3606,37 @@ DECLARE_PCI_FIXUP_FINAL(0x1b7c, 0x0004, /*
Ceton InfiniTV4 */
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x8169,
quirk_broken_intx_masking);

+
+static void quirk_disable_rtl_aspm(struct pci_dev *dev)
+{
+ struct pci_dev *pdev;
+ u16 val;
+
+ if (dev->bus && dev->bus->self)
+ pdev = dev->bus->self;
+ else
+ return;
+
+ if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
+ switch (pdev->device) {
+ case 0x9d15:
+ /* PCI bridges on Skylake */
+ case 0xa110 ... 0xa11f:
+ case 0xa167 ... 0xa16a:
+ /* PCI bridges on Kabylake */
+ case 0xa290 ... 0xa29f:
+ case 0xa2e7 ... 0xa2ee:
+ pci_info(dev, "quirk: disable the device's ASPM\n");
+ pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &val);
+ val &= ~PCI_EXP_LNKCTL_ASPMC;
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, val);
+ break;
+ }
+ }
+}
+
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_REALTEK, 0xb723, quirk_disable_rtl_aspm);
+
/*
* Intel i40e (XL710/X710) 10/20/40GbE NICs all have broken INTx masking,
* DisINTx can be set but the interrupt status bit is non-functional.

Even quirk_disable_rtl_aspm() disables the PCIe's ASPM in Link Control
and Status Register, we still have to clear the aspm_support in
rtl8723be module. Otherwise, the PCIe's ASPM will be enabled again,
then the AER flood comes again.
If the rtl8723be module should check the PCIe's ASPM feature first
generally, then using the PCI quirk way is feasible.

Ping-Ke Shih, any suggestion? If this is a better approach, I can
prepare a new version patch.

Jian-Hong Pan