Re: [PATCH] drm/i915: fix failure to power off after hibernate

From: Imre Deak
Date: Wed Feb 25 2015 - 13:33:45 EST


On ti, 2015-02-24 at 20:00 +0100, BjÃrn Mork wrote:
> Imre Deak <imre.deak@xxxxxxxxx> writes:
>
> > The poweroff handlers undo the actions of the thaw handlers. As the
> > original commit stated saving the registers is not needed there, but
> > it's also not a big overhead and there should be no problem doing it. We
> > are planning to optimize the hibernation sequence by for example not
> > shutting down the display between freeze and thaw, and also getting rid
> > of unnecessary steps at the power off phase. But before that we want to
> > actually unify things rather than having special cases, as maintaining
> > the special code paths caused already quite a lot of problems for us so
> > far.
>
> That sounds like a worthy goal. I don't understand what you hope to
> achieve by having a poweroff_late hook, since there aren't really
> anything useful left to do at the point it is called, but if you want a
> dummy callback there for code structure reasons then fine.

There are the following reasons for shutting down the device properly
during hibernation:
- ACPI mandates that the OSPM (the kernel in our case) puts all devices
into D3 that are not wake-up sources (i915 is not) (Kudos to Ville for
pointing this out)
- Embedded panels have a well defined shutdown sequence. We don't have
any good reason to not follow this, in fact for some panels the
subsequent reinitialization could be problematic in case of a hard
power-off. (Thanks to Jani for this info)

In fact the only reason why we didn't put the device into PCI D3 before
the patch you bisected, is kind of arbitrary. The PCI core puts every
device into D3 unless its driver saves the device's PCI state on its
own. Since before that patch we did save the state, but haven't put the
device into D3 it stayed in D0. That was definitely not intentional and
as such we depended on the BIOS to correct this for us afterwards.

> But you cannot just run around breaking stuff while slowly moving
> towards this goal over multiple releases... v3.19 is currently broken
> and that seems totally unnecessary.
>
> In any case: You should have noticed this problem while testing your
> patches. The breakage is 100% reproducible. Unfortunately I had to do a
> bisect to realize what you had done to the i915 driver, something I
> unfortunately didn't find time to do before v3.19 was released. But I
> do find it unnecessary to release with such bugs. Any attempt to
> exercise the code path you modified would have revealed the bug.

We tested these patches on numerous platforms and haven't seen the issue
you reported. Based on your feedback the current assumption is that this
is a bug in your BIOS, which tries to access the device despite it being
powered down. We're trying our best to test each change we commit on a
big set of platforms, but - especially on older hardware with random
BIOS versions/configurations - full coverage is not possible. So we
depend on reports like yours a lot to fill in the gaps.

> > Reverting the commit may hide some other issue, so before doing that
> > could you try the following patch:
> > http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html
>
> Makes no difference. I assume that patch fixes an unrelated bug? The
> age and reported symptoms indicates so. Note that I am reporting a
> regression introduced after v3.18, while that seems to fix a bug
> introduced in v3.17. Both v3.17 and v3.18 (including v3.18.6), as
> well as earlier releases, work fine for me.

Ok, thanks for trying.

> > If with that you still get the hang could you try on top of that the
> > patch below, first having only pci_set_power_state uncommented, then
> > both pci_set_power_state and pci_disable_device uncommented?
>
> That patch fixes the problem, with only pci_set_power_state commented
> out. Do you still want me to try with pci_disable_device() commented
> out as well?

No, but it would help if you could still try the two attached patch
separately, without any of the previous workarounds. Based on the
result, we'll follow up with a fix that adds for your specific platform
either of the attached workarounds or simply avoids putting the device
into D3 (corresponding to the patch you already tried).

Thanks,
Imre
From fe8b90c976f14eab80cb6715d0405157163d316e Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@xxxxxxxxx>
Date: Wed, 25 Feb 2015 19:38:53 +0200
Subject: [PATCH] drm/i915: zero PCI_COMMAND at the end of hibernation

Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_drv.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4badb23..b226cc6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -653,6 +653,8 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
pci_disable_device(drm_dev->pdev);
pci_set_power_state(drm_dev->pdev, PCI_D3hot);

+ pci_write_config_word(drm_dev->pdev, PCI_COMMAND, 0);
+
return 0;
}

--
2.1.0

From 18310629cbd32327edd8bc30969e23a7b20a3ce3 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak@xxxxxxxxx>
Date: Wed, 25 Feb 2015 19:43:33 +0200
Subject: [PATCH] drm/i915: use D3cold instead of D3hot during
suspend/hibernate

Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4badb23..807443a 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -651,7 +651,7 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
}

pci_disable_device(drm_dev->pdev);
- pci_set_power_state(drm_dev->pdev, PCI_D3hot);
+ pci_set_power_state(drm_dev->pdev, PCI_D3cold);

return 0;
}
--
2.1.0