Re: [PATCH v8 11/14] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR

From: Sean V Kelley
Date: Fri Oct 09 2020 - 18:07:25 EST


On 9 Oct 2020, at 14:30, Bjorn Helgaas wrote:

On Fri, Oct 09, 2020 at 12:57:45PM -0500, Bjorn Helgaas wrote:
On Fri, Oct 02, 2020 at 11:47:32AM -0700, Sean V Kelley wrote:
From: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>

When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC.
In some non-native cases in which there is no OS visible device
associated with the RCiEP, there is nothing to act upon as the firmware
is acting before the OS. So add handling for the linked 'rcec' in AER/ERR
while taking into account non-native cases.

Co-developed-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
Signed-off-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
---
drivers/pci/pcie/aer.c | 9 +++++----
drivers/pci/pcie/err.c | 39 ++++++++++++++++++++++++++++-----------
2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..dccdba60b5d9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev)
static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
{
int aer = dev->aer_cap;
+ int rc = 0;
u32 reg32;
- int rc;
-

/* Disable Root's interrupt in response to error messages */
pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);

- rc = pci_bus_error_reset(dev);
- pci_info(dev, "Root Port link has been reset\n");
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
+ rc = pci_bus_error_reset(dev);
+ pci_info(dev, "Root Port link has been reset\n");
+ }

/* Clear Root Error Status */
pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 38abd7984996..956ad4c86d53 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, void *data)
/**
* pci_walk_bridge - walk bridges potentially AER affected
* @bridge bridge which may be an RCEC with associated RCiEPs,
- * an RCiEP associated with an RCEC, or a Port.
+ * or a Port.
+ * @dev an RCiEP lacking an associated RCEC.
* @cb callback to be called for each device found
* @userdata arbitrary pointer to be passed to callback.
*
@@ -160,13 +161,20 @@ static int report_resume(struct pci_dev *dev, void *data)
* If the device provided has no subordinate bus, call the provided
* callback on the device itself.
*/
-static void pci_walk_bridge(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),
+static void pci_walk_bridge(struct pci_dev *bridge, struct pci_dev *dev,
+ int (*cb)(struct pci_dev *, void *),
void *userdata)
{
- if (bridge->subordinate)
+ /*
+ * In a non-native case where there is no OS-visible reporting
+ * device the bridge will be NULL, i.e., no RCEC, no PORT.
+ */
+ if (bridge && bridge->subordinate)
pci_walk_bus(bridge->subordinate, cb, userdata);
- else
+ else if (bridge)
cb(bridge, userdata);
+ else
+ cb(dev, userdata);
}

static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
@@ -196,16 +204,25 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
type = pci_pcie_type(dev);
if (type == PCI_EXP_TYPE_ROOT_PORT ||
type == PCI_EXP_TYPE_DOWNSTREAM ||
- type == PCI_EXP_TYPE_RC_EC ||
- type == PCI_EXP_TYPE_RC_END)
+ type == PCI_EXP_TYPE_RC_EC)
bridge = dev;
+ else if (type == PCI_EXP_TYPE_RC_END)
+ bridge = dev->rcec;
else
bridge = pci_upstream_bridge(dev);

pci_dbg(dev, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
- pci_walk_bridge(bridge, report_frozen_detected, &status);
+ pci_walk_bridge(bridge, dev, report_frozen_detected, &status);
if (type == PCI_EXP_TYPE_RC_END) {
+ /*
+ * The callback only clears the Root Error Status
+ * of the RCEC (see aer.c). Only perform this for the
+ * native case, i.e., an RCEC is present.
+ */
+ if (bridge)
+ reset_subordinate_devices(bridge);

Help me understand this. There are lots of callbacks in this picture,
but I guess this "callback only clears Root Error Status" must refer
to aer_root_reset(), i.e., the reset_subordinate_devices pointer?

Of course, the caller of pcie_do_recovery() supplied that pointer.
And we can infer that it must be aer_root_reset(), not
dpc_reset_link(), because RCECs and RCiEPs are not allowed to
implement DPC.

I wish we didn't have either this assumption about what
reset_subordinate_devices points to, or the assumption about what
aer_root_reset() does. They both seem a little bit tenuous.

We already made aer_root_reset() smart enough to check for RCECs. Can
we put the FLR there, too? Then we wouldn't have this weird situation
where reset_subordinate_devices() does a reset and clears error
status, EXCEPT for this case where it only clears error status and we
do the reset here?

Just as an example to make this concrete. Doesn't even compile.

Okay, I tried something similar. Comments below:



diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d6927e6535e5..e389db7cbba6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1372,28 +1372,45 @@ static int aer_probe(struct pcie_device *dev)
*/
static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
{
- int aer = dev->aer_cap;
+ int type = pci_pcie_type(dev);
+ struct pci_dev *root;
+ int aer = 0;
int rc = 0;
u32 reg32;

- /* Disable Root's interrupt in response to error messages */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
- reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ if (type == PCI_EXP_TYPE_RC_END)
+ root = dev->rcec;
+ else
+ root = dev;
+
+ if (root)
+ aer = root->aer_cap;

Except here dev->rcec may be NULL for the non-native case. So I had a goto label to bypass what follows. I left the FLR in err.c.



- if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
+ if (aer) {
+ /* Disable Root's interrupt in response to error messages */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
+ reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ }
+
+ if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
+ rc = flr_on_rciep(dev);
+ pci_info(dev, "has been reset (%d)\n", rc);
+ } else {
rc = pci_bus_error_reset(dev);
- pci_info(dev, "Root Port link has been reset\n");
+ pci_info(dev, "Root Port link has been reset (%d)\n", rc);
}

- /* Clear Root Error Status */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
+ if (aer) {
+ /* Clear Root Error Status */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, &reg32);
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);

- /* Enable Root Port's interrupt in response to error messages */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
- reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ /* Enable Root Port's interrupt in response to error messages */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
+ reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ }

return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
}
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 79ae1356141d..08976034a89c 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -203,36 +203,19 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
*/
if (type == PCI_EXP_TYPE_ROOT_PORT ||
type == PCI_EXP_TYPE_DOWNSTREAM ||
- type == PCI_EXP_TYPE_RC_EC)
+ type == PCI_EXP_TYPE_RC_EC ||
+ type == PCI_EXP_TYPE_RC_END)
bridge = dev;
- else if (type == PCI_EXP_TYPE_RC_END)
- bridge = dev->rcec;
else
bridge = pci_upstream_bridge(dev);

pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, dev, report_frozen_detected, &status);

We’d need to walk back the dev addition to the pci_walk_bridge() and revert to prior to the checks on dev->rcec.

- if (type == PCI_EXP_TYPE_RC_END) {
- /*
- * The callback only clears the Root Error Status
- * of the RCEC (see aer.c). Only perform this for the
- * native case, i.e., an RCEC is present.
- */
- if (bridge)
- reset_subordinates(bridge);
-
- status = flr_on_rciep(dev);
- if (status != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(dev, "Function Level Reset failed\n");
- goto failed;
- }
- } else {
- status = reset_subordinates(bridge);
- if (status != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(dev, "subordinate device reset failed\n");
- goto failed;
- }
+ status = reset_subordinates(bridge);
+ if (status != PCI_ERS_RESULT_RECOVERED) {
+ pci_warn(bridge, "subordinate device reset failed\n");
+ goto failed;
}
} else {
pci_walk_bridge(bridge, dev, report_normal_detected, &status);

So this is what I experimented with for the most part but with the changes that I highlighted - addressing non-native case in the callback and also in the bridge walk.

I can test it out.

Thanks,

Sean