Re: bcm2711_thermal: Kernel panic - not syncing: Asynchronous SError Interrupt

From: Juerg Haefliger
Date: Wed Jul 27 2022 - 04:05:24 EST


On Wed, 10 Feb 2021 14:59:45 -0800
Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:

> On 2/10/2021 8:55 AM, Nicolas Saenz Julienne wrote:
> > Hi Robin,
> >
> > On Wed, 2021-02-10 at 16:25 +0000, Robin Murphy wrote:
> >> On 2021-02-10 13:15, Nicolas Saenz Julienne wrote:
> >>> [ Add Robin, Catalin and Florian in case they want to chime in ]
> >>>
> >>> Hi Juerg, thanks for the report!
> >>>
> >>> On Wed, 2021-02-10 at 11:48 +0100, Juerg Haefliger wrote:
> >>>> Trying to dump the BCM2711 registers kills the kernel:
> >>>>
> >>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/range
> >>>> 0-efc
> >>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/registers
> >>>>
> >>>> [ 62.857661] SError Interrupt on CPU1, code 0xbf000002 -- SError
> >>>
> >>> So ESR's IDS (bit 24) is set, which means it's an 'Implementation Defined
> >>> SError,' hence IIUC the rest of the error code is meaningless to anyone outside
> >>> of Broadcom/RPi.
> >>
> >> It's imp-def from the architecture's PoV, but the implementation in this
> >> case is Cortex-A72, where 0x000002 means an attributable, containable
> >> Slave Error:
> >>
> >> https://developer.arm.com/documentation/100095/0003/system-control/aarch64-register-descriptions/exception-syndrome-register--el1-and-el3?lang=en
> >>
> >> In other words, the thing at the other end of an interconnect
> >> transaction said "no" :)
> >>
> >> (The fact that Cortex-A72 gets too far ahead of itself to take it as a
> >> synchronous external abort is a mild annoyance, but hey...)
> >
> > Thanks for both your clarifications! Reading arm documentation is a skill on
> > its own.
>
> Yes it is.
>
> >
> >>> The regmap is created through the following syscon device:
> >>>
> >>> avs_monitor: avs-monitor@7d5d2000 {
> >>> compatible = "brcm,bcm2711-avs-monitor",
> >>> "syscon", "simple-mfd";
> >>> reg = <0x7d5d2000 0xf00>;
> >>>
> >>> thermal: thermal {
> >>> compatible = "brcm,bcm2711-thermal";
> >>> #thermal-sensor-cells = <0>;
> >>> };
> >>> };
> >>>
> >>> I've done some tests with devmem, and the whole <0x7d5d2000 0xf00> range is
> >>> full of addresses that trigger this same error. Also note that as per Florian's
> >>> comments[1]: "AVS_RO_REGISTERS_0: 0x7d5d2200 - 0x7d5d22e3." But from what I can
> >>> tell, at least 0x7d5d22b0 seems to be faulty too.
> >>>
> >>> Any ideas/comments? My guess is that those addresses are marked somehow as
> >>> secure, and only for VC4 to access (VC4 is RPi4's co-processor). Ultimately,
> >>> the solution is to narrow the register range exposed by avs-monitor to whatever
> >>> bcm2711-thermal needs (which is ATM a single 32bit register).
> >>
> >> When a peripheral decodes a region of address space, nobody says it has
> >> to accept accesses to *every* address in that space; registers may be
> >> sparsely populated, and although some devices might be "nice" and make
> >> unused areas behave as RAZ/WI, others may throw slave errors if you poke
> >> at the wrong places. As you note, in a TrustZone-aware device some
> >> registers may only exist in one or other of the Secure/Non-Secure
> >> address spaces.
> >>
> >> Even when there is a defined register at a given address, it still
> >> doesn't necessarily accept all possible types of access; it wouldn't be
> >> particularly friendly, but a device *could* have, say, some registers
> >> that support 32-bit accesses and others that only support 16-bit
> >> accesses, and thus throw slave errors if you do the wrong thing in the
> >> wrong place.
> >>
> >> It really all depends on the device itself.
> >
> > All in all, assuming there is no special device quirk to apply, the feeling I'm
> > getting is to just let the error be. As you hint, firmware has no blame here,
> > and debugfs is a 'best effort, zero guarantees' interface after all.
>
> We should probably fill a regmap_access_table to deny reading registers
> for which there is no address decoding and possibly another one to deny
> writing to the read-only registers.


Below is a patch that adds a read access table but it seems wrong to include
'internal.h' and add the table in the thermal driver. Shouldn't this happen
in a higher layer, somehow between syscon and the thermal node?

...Juerg


diff --git a/drivers/thermal/broadcom/bcm2711_thermal.c b/drivers/thermal/broadcom/bcm2711_thermal.c
index 6e2ff710b2ec..a831c33f6d9a 100644
--- a/drivers/thermal/broadcom/bcm2711_thermal.c
+++ b/drivers/thermal/broadcom/bcm2711_thermal.c
@@ -21,6 +21,7 @@
#include <linux/thermal.h>

#include "../thermal_hwmon.h"
+#include "../../base/regmap/internal.h"

#define AVS_RO_TEMP_STATUS 0x200
#define AVS_RO_TEMP_STATUS_VALID_MSK (BIT(16) | BIT(10))
@@ -67,6 +68,32 @@ static const struct of_device_id bcm2711_thermal_id_table[] = {
};
MODULE_DEVICE_TABLE(of, bcm2711_thermal_id_table);

+/* Readable register ranges.
+ * Ranges determined experimentally by reading every register. Non-readable
+ * register reads cause SError exceptions. */
+static const struct regmap_range bcm2711_thermal_rd_ranges[] = {
+ regmap_reg_range(0x000, 0x010),
+ regmap_reg_range(0x034, 0x044),
+ regmap_reg_range(0x068, 0x098),
+ regmap_reg_range(0x0ac, 0x0c8),
+ regmap_reg_range(0x100, 0x100),
+ regmap_reg_range(0x108, 0x108),
+ regmap_reg_range(0x110, 0x124),
+ regmap_reg_range(0x200, 0x2ac),
+ regmap_reg_range(0x2e0, 0x2e0),
+ regmap_reg_range(0x800, 0x810),
+ regmap_reg_range(0xd00, 0xd8c),
+ regmap_reg_range(0xdd0, 0xdd4),
+ regmap_reg_range(0xdf8, 0xe8c),
+ regmap_reg_range(0xed0, 0xed4),
+ regmap_reg_range(0xef8, 0xefc),
+};
+
+static const struct regmap_access_table bcm2711_thermal_rd_table = {
+ .yes_ranges = bcm2711_thermal_rd_ranges,
+ .n_yes_ranges = ARRAY_SIZE(bcm2711_thermal_rd_ranges),
+};
+
static int bcm2711_thermal_probe(struct platform_device *pdev)
{
struct thermal_zone_device *thermal;
@@ -90,6 +117,7 @@ static int bcm2711_thermal_probe(struct platform_device *pdev)
return ret;
}
priv->regmap = regmap;
+ priv->regmap->rd_table = &bcm2711_thermal_rd_table;

thermal = devm_thermal_zone_of_sensor_register(dev, 0, priv,
&bcm2711_thermal_of_ops);

Attachment: pgpQxx3CXFcj6.pgp
Description: OpenPGP digital signature