Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

From: Mario Kleiner
Date: Wed Jan 20 2016 - 15:32:40 EST


On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
On 01/16/2016 05:24 AM, Mario Kleiner wrote:


On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:

I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770 and so far i
can't trigger a hang after hundreds of runs.

Does this also hang for you?

No, test mode seems to be fine.

I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
should probably give useful info around the time of the hang.

Attached. Captured by having kdm running, switching to console, running
"dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
frozen splashscreen, switch back, terminate dmesg. So somewhere around
the middle there should be where ksplashscreen starts...

Maybe also check XOrg.0.log for (WW) warnings related to flip.

No such warnings there.

thanks,
-mario


Thanks,
Vlastimil



Thanks. So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

The new code in drm_update_vblank_count() breaks this. The reset of the counter to zero is treated as counter wraparound, so our software vblank counter jumps forward by up to 2^24 counts in response (in case of AMD's 24 bit hw counters), and then the vblank event handling code in drm_handle_vblank_events() and other places detects the counter being more than 2^23 counts ahead of queued vblank events and as part of its own wraparound handling for the 32-Bit software counter doesn't deliver these queued events for a long time -> no vblank swap trigger event -> no swap -> client hangs waiting for swap completion.

I think i remember seeing the ksplash progress screen occasionally blanking half way through login, i guess that's when kwin triggers a modeset in parallel to ksplash doing its OpenGL animations. So depending on the hw vblank count at the time of login ksplash would or wouldn't hang, apparently i got "lucky" with my counts at login.

-mario