Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

From: Mario Kleiner
Date: Thu Jan 21 2016 - 03:28:44 EST


On 01/21/2016 07:38 AM, Michel DÃnzer wrote:
On 21.01.2016 14:31, Mario Kleiner wrote:
On 01/21/2016 04:43 AM, Michel DÃnzer wrote:
On 21.01.2016 05:32, Mario Kleiner wrote:

So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?


The whole logic has changed and the software counter updates are now
driven all the time by the hw counter.


BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...

I spent the last few hours reading through the drm and radeon code and i
think what should probably work is to replace the
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
calls. These are apparently meant for drivers whose hw counters reset
during modeset, [...]

... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").


Yes, i think reverting that one alone would likely fix it by reverting to the old vblank update logic.


Once drm_vblank_off is called, drm_vblank_get will no-op and return an
error, so clients can't enable vblank irqs during the modeset - pageflip
ioctl and waitvblank ioctl would fail while a modeset happens -
hopefully userspace handles this correctly everywhere.

We've fixed xf86-video-ati for this.


I'll hack up a patch for demonstration now.

You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html



Oops. Just sent out my little (so far untested) creations. Yes, they are essentially the same as Daniel's patches. The only addition is to also fix that other potential small race i describe by slightly moving the xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put imbalance in radeon_pm if vblank_on/off would be used.

-mario