Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater

From: Daniel Vetter
Date: Thu Oct 20 2016 - 10:35:00 EST


On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote:
> On Thu, 20 Oct 2016 15:28:14 +0200,
> Ville Syrjälä wrote:
> >
> > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote:
> > > Since 4.7 kernel, we've seen the error messages like
> > >
> > > kernel: [TTM] Buffer eviction failed
> > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
> > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> > >
> > > on QXL when switching and accessing on VT. The culprit was the generic
> > > deferred_io code (qxl driver switched to it since 4.7). There is a
> > > race between the dirty clip update and the call of callback.
> > >
> > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
> > > while it kicks off the update worker outside the spinlock. Meanwhile
> > > the update worker clears the dirty clip in the spinlock, too. Thus,
> > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is
> > > called after the clip is cleared in the first worker call.
> >
> > Why does that matter? The first worker should have done all the
> > necessary work already, no?
>
> Before the first call, it clears the clip and passes the copied clip
> to the callback. Then the second call will be with the cleared and
> untouched clip, i.e. with x1=~0. This confuses
> qxl_framebuffer_dirty().
>
> Of course, we can filter out in the callback side by checking the
> clip. It was actually my first version. But basically it's a race
> and should be covered better in the caller side.

Hm, I thought schedule_work also schedules the work when it's getting
processed right now. Which means if you're super unlucky you can still end
up with the work hitting an empty rectangle. I think filtering empty rects
in the worker is what we need to do instead.

Or is coffee not working right now?
-Daniel
>
>
> thanks,
>
> Takashi
>
> >
> > >
> > > The fix is simply moving schedule_work() inside the spinlock.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322
> > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298
> > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support')
> > > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
> > > ---
> > > drivers/gpu/drm/drm_fb_helper.c | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> > > index 03414bde1f15..bae392dea2cc 100644
> > > --- a/drivers/gpu/drm/drm_fb_helper.c
> > > +++ b/drivers/gpu/drm/drm_fb_helper.c
> > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y,
> > > clip->y1 = min_t(u32, clip->y1, y);
> > > clip->x2 = max_t(u32, clip->x2, x + width);
> > > clip->y2 = max_t(u32, clip->y2, y + height);
> > > - spin_unlock_irqrestore(&helper->dirty_lock, flags);
> > > -
> > > schedule_work(&helper->dirty_work);
> > > + spin_unlock_irqrestore(&helper->dirty_lock, flags);
> > > }
> > >
> > > /**
> > > --
> > > 2.10.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Ville Syrjälä
> > Intel OTC
> >

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch