Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

From: Sai Prakash Ranjan
Date: Sat Mar 28 2020 - 03:35:21 EST


Hi Robin,

On 2020-03-28 00:32, Robin Murphy wrote:
On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:

Imagine your network driver doesn't implement a .shutdown method (so
the hardware is still active regardless of device links), happens to
have an Rx buffer or descriptor ring DMA-mapped at an IOVA that looks
like the physical address of the memory containing some part of the
kernel text lower down that call stack, and the MAC receives a
broadcast IP packet at about the point arm_smmu_device_shutdown() is
returning. Enjoy debugging that ;)

And if coincidental memory corruption seems too far-fetched for your
liking, other fun alternatives might include "display tries to scan
out from powered-off device, deadlocks interconnect and prevents
anything else making progress", or "access to TZC-protected physical
address triggers interrupt and over-eager Secure firmware resets
system before orderly poweroff has a chance to finish".

Of course the fact that in practice we'll *always* see the warning
because there's no way to tear down the default DMA domains, and even
if all devices *have* been nicely quiesced there's no way to tell, is
certainly less than ideal. Like I say, it's not entirely clear-cut
either way...


Thanks for these examples, good to know these scenarios in case we come across these.
However, if we see these error/warning messages appear everytime then what will be
the credibility of these messages? We will just ignore these messages when
these issues you mention actually appears because we see them everytime on
reboot or shutdown. So doesn't it make sense to enable these only when
we are debugging? We could argue that how will we know the issue could be related
to SMMU, but that's the case even now.

The reason why this came up was that, we had a NOC(interconnect) error which does
have a logging atleast in QCOM platforms from the secure side(it prints these on the console)
after the SMMU err messages and there was a confusion if it was related to these messages.
However, NOC error messages did identify the issue with the USB and it was solved later.
So these SMMU err/warning messages could be misleading like the above case almost everytime.

The probability of the issues you mentioned occuring is very less than the actual reboot,
shutdown scenarios, for ex: we run reboot stress test for thousands of times and these messages
don't add anything special in those cases when any issue occurs because they are seen
everytime.

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation