Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"

From: Cristian Marussi
Date: Tue Oct 11 2022 - 06:40:59 EST


On Tue, Oct 11, 2022 at 03:34:45PM +0530, Shivnandan Kumar wrote:
>
> Hi Cristian,
>

Hi Shivnandan,

> >>Ok, just out of curiosity, once done, can you point me at your downstream
> public sources so I can see the issue and the fix that you are applying to
> your trees ?
>
> https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0
>
> I have added lock while accessing con_priv inside irq handler and shutdown
> function.
>

Thanks !

>
> I have one input regarding timeout from firmware, can we enable BUG on
> response  time out in function do_xfer based on some debug config flag,this
> will help to debug firmware timeout issue faster.
>
> We will only enable that config flag during internal testing.
>

I understand a sort of 'Panic-on-timeout' would be useful to just freeze
the system as it is and debug, but it seems to me pretty much invasive
(and generally frowned upon) to BUG_ON timeouts, given on some SCMI
platforms/transports a few timeouts can happen really not so infrequently
due to transient conditions during moments of peak SCMI traffic.

Even though you mention to make it conditional to Kconfig, I'm not sure
this could fly, especially if you want to enable only for internal
testing...I'll ping Sudeep about this to see what he thinks.

As an alternative, what if I try to improve SCMI tracing/debug, let's say
dumping more info in dmesg about the offending (timed-out) message instead
of hanging the system as a whole ?

I'd have also some still-brewing-and-not-published patches to add some
SCMI stats somewhere in sysfs to be able to read current SCMI errors/timeouts
and transport anomalies, would that be of interest ?

...maybe, we could combine some of these stats and some sort of
BUG_ON/WARN_ON (if it will fly eventually..) into some kind SCMI_DEBUG mode
...any input on your needs about which kind of SCMI info you'll like to see
exposed by the stack would be welcome.

Last but not least, since we are talking about SCMI Server/FW testing,
have you (or your team) seen this work-in-progress of mine:

https://lore.kernel.org/linux-arm-kernel/20220903183042.3913053-1-cristian.marussi@xxxxxxx/

about a new unified userspace interface to inject/snoop SCMI messages to
test/fuzz/stress the SCMI server wherever it is placed ?

Any feedback on the API proprosed in the cover-letter would be highly welcome;
I'll post a new V4 next week possibly, and the changes to the existing ARM SCMI
Compliance suite (mentioned in the cover) to support this new SCMI Raw
mode are in their final stage too.

Thanks,
Cristian