drm/msm: Second DisplayPort regression in 6.8-rc1

From: Johan Hovold
Date: Mon Feb 19 2024 - 05:41:58 EST


On Sat, Feb 17, 2024 at 04:14:58PM +0100, Johan Hovold wrote:
> On Wed, Feb 14, 2024 at 02:52:06PM +0100, Johan Hovold wrote:
> > On Tue, Feb 13, 2024 at 10:00:13AM -0800, Abhinav Kumar wrote:

> Since Dmitry had trouble reproducing this issue I took a closer look at
> the DRM aux bridge series that Abhinav pointed and was able to track
> down the bridge regressions and come up with a reproducer. I just posted
> a series fixing this here:
>
> https://lore.kernel.org/lkml/20240217150228.5788-1-johan+linaro@xxxxxxxxxx/
>
> As I mentioned in the cover letter, I am still seeing intermittent hard
> resets around the time that the DRM subsystem is initialising, which
> suggests that we may be dealing with two separate DRM regressions here
> however.
>
> If the hard resets are triggered by something like unclocked hardware,
> perhaps that bit could this be related to the runtime PM rework?

It seems my initial suspicion that at least some of these regressions
were related to the runtime PM work was correct. The hard resets happens
when the DP controller is runtime suspended after being probed:

[ 16.748475] bus: 'platform': __driver_probe_device: matched device ae00000.display-subsystem with driver msm-mdss
[ 16.759444] msm-mdss ae00000.display-subsystem: Adding to iommu group 21
[ 16.795226] bus: 'platform': __driver_probe_device: matched device ae01000.display-controller with driver msm_dpu
[ 16.807542] probe of ae01000.display-controller returned -517 after 3 usecs
[ 16.821552] bus: 'platform': __driver_probe_device: matched device ae90000.displayport-controller with driver msm-dp-display
[ 16.837749] probe of ae90000.displayport-controller returned -517 after 1 usecs
[ OK ] Listening on Load/Save RF Kill Swit[ 16.854659] bus: 'platform': __dch Status /dev/rfkill Watch.
[ 16.868458] probe of ae98000.displayport-controller returned -517 after 2 usecs
[ 16.880012] bus: 'platform': __driver_probe_device: matched device aea0000.displayport-controller with driver msm-dp-display
[ 16.891856] probe of aea0000.displayport-controller returned -517 after 2 usecs
[ 16.903825] probe of ae00000.display-subsystem returned 0 after 144497 usecs
[ 16.911636] bus: 'platform': __driver_probe_device: matched device ae01000.display-controller with driver msm_dpu
[ 16.942092] probe of ae01000.display-controller returned 0 after 19593 usecs
Starting Load/Save Screen Backligh…rightness[ 16.959146] bus: 'platform': _ of backlight:backlight...
[ 16.995355] msm-dp-display ae90000.displayport-controller: dp_display_probe - probe tail
[ 17.004032] probe of ae90000.displayport-controller returned 0 after 30225 usecs
[ 17.012308] bus: 'platform': __driver_probe_device: matched device ae98000.displayport-controller with driver msm-dp-display
[ 17.050193] msm-dp-display ae98000.displayport-controller: dp_display_probe - probe tail
Starting Network Name Resolution...
[ 17.058925] probe of ae98000.displayport-controller returned 0 after 34774 usecs
[ 17.074925] bus: 'platform': __driver_probe_device: matched device aea0000.displayport-controller with driver msm-dp-display
[ Starting Network Time Synchronization...
[ 17.112000] msm-dp-display aea0000.displayport-controller: dp_display_probe - populate aux bus
[ 17.125208] msm-dp-display aea0000.displayport-controller: dp_pm_runtime_resume
Starting Record System Boot/Shutdown in UTMP...
Starting Virtual Console Setup...
[ OK ] Finished Load/Save Screen Backlight Brightness of backlight:backlight.
[ 17.197909] msm-dp-display aea0000.displayport-controller: dp_pm_runtime_suspend
[ 17.198079] probe of aea0Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
S - QC_IMAGE_VERSION_STRING=BOOT.MXF.1.1-00470-MAKENA-1
S - IMAGE_VARIANT_STRING=SocMakenaWP
S - OEM_IMAGE_VERSION_STRING=crm-ubuntu92

< machine is reset by hypervisor >

Presumably the reset happens when controller is being shut down while
still being used by the EFI framebuffer.

In the cases where the machines survives boot, the controller is never
suspended.

When investigating this I've also seen intermittent:

[drm:dp_display_probe [msm]] *ERROR* device tree parsing failed

which also appears to be related to the runtime PM rework:

https://lore.kernel.org/lkml/1701472789-25951-1-git-send-email-quic_khsieh@xxxxxxxxxxx/

I believe this is enough evidence to conclude that this second
regression is introduced by commit 5814b8bf086a ("drm/msm/dp:
incorporate pm_runtime framework into DP driver"):

#regzbot introduced: 5814b8bf086a

Has anyone given some thought to how the framebuffer handover is
supposed to work? It seems we're currently just relying on luck with
timing.

Johan