Re: [crash, PATCH] Revert "drm/radeon/kms: move radeon KMS on/offswitch out of staging."

From: Jerome Glisse
Date: Tue Feb 02 2010 - 07:00:46 EST


On Tue, Feb 02, 2010 at 09:17:27AM +0100, Ingo Molnar wrote:
>
> * Dave Airlie <airlied@xxxxxxxx> wrote:
>
> > > Hi Linus,
> > >
> > > Please pull the 'drm-linus' branch from
> > > ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-linus
> > >
> >
> > I've also added an oops fix I seem to lose off my radar to this tree.
> >
> > commit 17aafccab4352b422aa01fa6ebf82daff693a5b3
> > Author: Michel D??nzer <daenzer@xxxxxxxxxx>
> > Date: Fri Jan 22 09:20:00 2010 +0100
> >
> > drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
>
> FYI, this drm pull into mainline has triggered quick boot crashes in -tip
> testing (even with the above fix applied), on an Athlon64 whitebox PC with:
>
> 01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)]
> 01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
>
> the crash is:
>
> [ 7.111003] radeon 0000:01:00.0: Disabling GPU acceleration
> [ 7.273547] Failed to wait GUI idle while programming pipes. Bad things might happen.
> [ 7.436296] [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
> [ 7.598755] Failed to wait GUI idle while programming pipes. Bad things might happen.
> [ 7.599306] BUG: unable to handle kernel paging request at f8380000
> [ 7.599999] IP: [<c149f0de>] rv370_pcie_gart_set_page+0x2d/0x3c
> [ 7.599999] *pde = 36d44067 *pte = 00000000
> [ 7.599999] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [ 7.599999] last sysfs file:
>
> i have bisected it back to:
>
> | 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0 is the first bad commit
> | commit 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0
> | Author: Dave Airlie <airlied@xxxxxxxxxx>
> | Date: Fri Jan 29 15:31:47 2010 +1000
> |
> | drm/radeon/kms: fix incorrect logic in DP vs eDP connector checking.
> |
> | This makes displayport work again here.
>
> Unfortunately even with that patch reverted it still crashes. Config and
> bootlog attached.
>
> It's the moving of radeom KMS out of staging after -rc6 that causes it,
> because it brought it into the scope of my testing:
>
> f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging.
>
> So at least on this box it's clearly not ready for mainline enablement yet.
> I've attached the revert patch further below.
>
> Ingo
>

Attached is a patch which will fix the oops, still it's strange that
CP fails to init on your config. Do you have IOMMU enabled ? I haven't
played with iommu stuff thus i wonder if we are missing somethings in
this area.

Anyway the root issue of the oops made me wonder if we shouldn't explore
some kind of stack based GPU block initialization, when we init a GPU
block we push and when we want to deinit we pop the stack thus block
should be deinited in right order. We are having a lot of different
path in the driver and the failure path are not as well tested as
they might be. Anyway such change is more a long term idea that might
be good to look into (maybe for 2.6.35).

Cheers,
Jerome