Re: [RFC/PATCH] drm/rockchip: don't wait for vblank if fb hasn't changed

From: John Keeping
Date: Wed Jan 13 2016 - 12:40:35 EST


On Wed, 13 Jan 2016 18:19:17 +0100, Daniel Vetter wrote:

> On Wed, Jan 13, 2016 at 04:40:38PM +0000, John Keeping wrote:
> > On Wed, 13 Jan 2016 17:21:56 +0100, Daniel Vetter wrote:
> >
> > > On Wed, Jan 13, 2016 at 03:55:29PM +0000, John Keeping wrote:
> > > > On Wed, 13 Jan 2016 16:40:05 +0100, Daniel Vetter wrote:
> > > >
> > > > > On Wed, Jan 13, 2016 at 02:34:25PM +0000, John Keeping wrote:
> > > > > > On Wed, 13 Jan 2016 15:23:20 +0100, Daniel Vetter wrote:
> > > > > >
> > > > > > > On Wed, Jan 13, 2016 at 12:53:34PM +0000, John Keeping wrote:
> > > > > > > > As commented in drm_atomic_helper_wait_for_vblanks(), userspace
> > > > > > > > relies on cursor ioctls being unsynced. Converting the rockchip
> > > > > > > > driver to atomic has significantly impacted cursor performance by
> > > > > > > > making every cursor update wait for vblank.
> > > > > > > >
> > > > > > > > By skipping the vblank sync when the framebuffer has not changed
> > > > > > > > (as is done in drm_atomic_helper_wait_for_vblanks()) we can avoid
> > > > > > > > this for the common case of moving the cursor and only need to
> > > > > > > > delay the cursor ioctl when the cursor icon changes.
> > > > > > > >
> > > > > > > > I originally inserted a check on legacy_cursor_update as well, but
> > > > > > > > that caused a storm of iommu page faults. I didn't investigate the
> > > > > > > > cause of those since this change gives enough of a performance
> > > > > > > > improvement for my use case.
> > > > > > > >
> > > > > > > > This is RFC because of that and because the framebuffer_changed()
> > > > > > > > function is copied from drm_atomic_helper.c as a quick way to test
> > > > > > > > the result.
> > > > > > > >
> > > > > > > > Signed-off-by: John Keeping <john@xxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > drivers/gpu/drm/rockchip/rockchip_drm_fb.c | 27
> > > > > > > > +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2
> > > > > > > > deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> > > > > > > > b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c index f784488..8fd9821
> > > > > > > > 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> > > > > > > > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> > > > > > > > @@ -177,8 +177,28 @@ static void
> > > > > > > > rockchip_crtc_wait_for_update(struct drm_crtc *crtc)
> > > > > > > > crtc_funcs->wait_for_update(crtc); }
> > > > > > > >
> > > > > > > > +static bool framebuffer_changed(struct drm_device *dev,
> > > > > > > > + struct drm_atomic_state *old_state,
> > > > > > > > + struct drm_crtc *crtc)
> > > > > > > > +{
> > > > > > > > + struct drm_plane *plane;
> > > > > > > > + struct drm_plane_state *old_plane_state;
> > > > > > > > + int i;
> > > > > > > > +
> > > > > > > > + for_each_plane_in_state(old_state, plane, old_plane_state,
> > > > > > > > i) {
> > > > > > > > + if (plane->state->crtc != crtc &&
> > > > > > > > + old_plane_state->crtc != crtc)
> > > > > > > > + continue;
> > > > > > > > +
> > > > > > > > + if (plane->state->fb != old_plane_state->fb)
> > > > > > > > + return true;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + return false;
> > > > > > > > +}
> > > > > > >
> > > > > > > Please don't hand-roll logic that affects semantics like this. Instead
> > > > > > > please use drm_atomic_helper_wait_for_vblanks(), which should do this
> > > > > > > correctly for you.
> > > > > > >
> > > > > > > If that's not the case then we need to improve the generic helper, or
> > > > > > > figure out what's different with rockhip.
> > > > > >
> > > > > > According to commit 63ebb9f (drm/rockchip: Convert to support atomic
> > > > > > API) it's because rockchip doesn't have a hardware vblank counter.
> > > > > >
> > > > > > I'm not entirely clear on why this prevents the use of
> > > > > > drm_atomic_helper_wait_for_vblanks().
> > > > >
> > > > > Hm, that commit isn't terribly helpful. If that's really needed then imo I
> > > > > think we should extract a "drm_atomic_helper_plane_needs_vblank_wait()"
> > > > > helper that's used by both. But since rockchip does vblank_get/put calls
> > > > > I'd hope vblanks actually work correctly. And then the helper should work
> > > > > too.
> > > >
> > > > I tried switching the call to rockchip_crtc_wait_for_update() to
> > > > drm_atomic_helper_wait_for_vblanks() and it works fine until I switch
> > > > the buffer associated with a cursor, at which point I get iommu page
> > > > faults, presumably because the GEM buffer is unreferenced too early.
> > > >
> > > > AFAICT the buffer will be released via drm_atomic_state_free()
> > > > unconditionally, but I suspect I'm missing something since that would
> > > > mean every driver would hit a similar problem.
> > >
> > > Yeah, with the helper we always skip, which means when the cursor bo
> > > changes you indeed unmap too early. So can't even share the overall
> > > condition, but we could definitely share the little framebuffer_changed
> > > helper.
> >
> > That leaves me with the question: why do other atomic drivers work?
> >
> > If drm_atomic_helper_wait_for_vblanks() skipping vblanks results in the
> > cursor bo being unmapped too early for rockchip, why is it not unmapped
> > too early for all of the other drivers using that helper?
>
> It's unmapped too early for everyone, it's just that normally that doesn't
> result in a fireworks show. What we maybe could/should do is do the
> unmapping asynchronously, but that runs into the overall "current atomic
> helpers don't do async yet" problem. Might be a good point to start fixing
> this up though.

OK, thanks, I think I'm beginning to understand how this all fits
together.

It looks like there are two options for me to get reasonable cursor
performance on rockchip in the short term:

1) Export the current framebuffer_changed() function as
drm_atomic_helper_framebuffer_changed() and use it in
rockchip_crtc_wait_for_update().

2) Add a mechanism to suppress the legacy_cursor_update check in
drm_atomic_helper_wait_for_vblanks() and switch the rockchip driver
over to it.

In both of these cases we're only restoring the unsynced cursor ioctls
behaviour when the cursor is moved but it will still be expensive when
the cursor bo changes. That gives sufficient performance in my testing.