Re: [Intel-gfx] [PATCH v5 2/2] drm/i915: Acquire PUNIT->PMIC bus for intel_uncore_forcewake_reset()

From: Ingo Molnar
Date: Tue Oct 31 2017 - 07:54:11 EST



* Daniel Vetter <daniel@xxxxxxxx> wrote:

> On Tue, Oct 31, 2017 at 10:50:06AM +0100, Ingo Molnar wrote:
> >
> > * Hans de Goede <j.w.r.degoede@xxxxxxxxx> wrote:
> >
> > > intel_uncore_forcewake_reset() does forcewake puts and gets as such
> > > we need to make sure that no-one tries to access the PUNIT->PMIC bus
> > > (on systems where this bus is shared) while it runs, otherwise bad
> > > things happen.
> > >
> > > Normally this is taken care of by the i915_pmic_bus_access_notifier()
> > > which does an intel_uncore_forcewake_get(FORCEWAKE_ALL) when some other
> > > driver tries to access the PMIC bus, so that later forcewake gets are
> > > no-ops (for the duration of the bus access).
> > >
> > > But intel_uncore_forcewake_reset gets called in 3 cases:
> > > 1) Before registering the pmic_bus_access_notifier
> > > 2) After unregistering the pmic_bus_access_notifier
> > > 3) To reset forcewake state on a GPU reset
> > >
> > > In all 3 cases the i915_pmic_bus_access_notifier() protection is
> > > insufficient.
> > >
> > > This commit fixes this race by calling iosf_mbi_punit_acquire() before
> > > calling intel_uncore_forcewake_reset(). In the case where it is called
> > > directly after unregistering the pmic_bus_access_notifier, we need to
> > > hold the punit-lock over both calls to avoid a race where
> > > intel_uncore_fw_release_timer() may execute between the 2 calls.
> > >
> > > Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
> > > Reviewed-by: Imre Deak <imre.deak@xxxxxxxxx>
> > > ---
> > > Changes in v2:
> > > -Rebase on current (July 6th 2017) drm-next
> > >
> > > Changes in v3:
> > > -Keep punit acquired / locked over the unregister + forcewake_reset
> > > call combo to avoid a race hitting between the 2 calls
> > > -This requires modifying iosf_mbi_unregister_pmic_bus_access_notifier
> > > to not take the lock itself, since we are the only users this is done
> > > in this same commit
> > >
> > > Changes in v4:
> > > -Fix missing rename in doc-comment
> > > -Add and use iosf_mbi_assert_punit_acquired() helper
> > > -Add missing acquire surrounding intel_uncore_forcewake_reset() inside
> > > intel_uncore_check_forcewake_domains()
> > > -Add Imre's Reviewed-by
> > >
> > > Changes in v5:
> > > -Separate out arch/x86 iosf_mbi changes into a separate patch
> > > ---
> > > drivers/gpu/drm/i915/intel_uncore.c | 17 +++++++++++++----
> > > drivers/gpu/drm/i915/selftests/intel_uncore.c | 3 +++
> > > 2 files changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 8c2ce81f01c2..0da81faf3981 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -229,6 +229,7 @@ intel_uncore_fw_release_timer(struct hrtimer *timer)
> > > return HRTIMER_NORESTART;
> > > }
> > >
> > > +/* Note callers must have acquired the PUNIT->PMIC bus, before calling this. */
> > > static void intel_uncore_forcewake_reset(struct drm_i915_private *dev_priv,
> > > bool restore)
> > > {
> > > @@ -237,6 +238,8 @@ static void intel_uncore_forcewake_reset(struct drm_i915_private *dev_priv,
> > > int retry_count = 100;
> > > enum forcewake_domains fw, active_domains;
> > >
> > > + iosf_mbi_assert_punit_acquired();
> > > +
> > > /* Hold uncore.lock across reset to prevent any register access
> > > * with forcewake not set correctly. Wait until all pending
> > > * timers are run before holding.
> > > @@ -416,14 +419,18 @@ static void __intel_uncore_early_sanitize(struct drm_i915_private *dev_priv,
> > > GT_FIFO_CTL_RC6_POLICY_STALL);
> > > }
> > >
> > > + iosf_mbi_punit_acquire();
> > > intel_uncore_forcewake_reset(dev_priv, restore_forcewake);
> > > + iosf_mbi_punit_release();
> > > }
> > >
> > > void intel_uncore_suspend(struct drm_i915_private *dev_priv)
> > > {
> > > - iosf_mbi_unregister_pmic_bus_access_notifier(
> > > + iosf_mbi_punit_acquire();
> > > + iosf_mbi_unregister_pmic_bus_access_notifier_unlocked(
> > > &dev_priv->uncore.pmic_bus_access_nb);
> > > intel_uncore_forcewake_reset(dev_priv, false);
> > > + iosf_mbi_punit_release();
> > > }
> > >
> > > void intel_uncore_resume_early(struct drm_i915_private *dev_priv)
> > > @@ -1315,12 +1322,14 @@ void intel_uncore_init(struct drm_i915_private *dev_priv)
> > >
> > > void intel_uncore_fini(struct drm_i915_private *dev_priv)
> > > {
> > > - iosf_mbi_unregister_pmic_bus_access_notifier(
> > > - &dev_priv->uncore.pmic_bus_access_nb);
> > > -
> > > /* Paranoia: make sure we have disabled everything before we exit. */
> > > intel_uncore_sanitize(dev_priv);
> > > +
> > > + iosf_mbi_punit_acquire();
> > > + iosf_mbi_unregister_pmic_bus_access_notifier_unlocked(
> > > + &dev_priv->uncore.pmic_bus_access_nb);
> > > intel_uncore_forcewake_reset(dev_priv, false);
> > > + iosf_mbi_punit_release();
> > > }
> > >
> > > static const struct reg_whitelist {
> > > diff --git a/drivers/gpu/drm/i915/selftests/intel_uncore.c b/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > index 3cac22eb47ce..733d87fe7737 100644
> > > --- a/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > @@ -148,7 +148,10 @@ static int intel_uncore_check_forcewake_domains(struct drm_i915_private *dev_pri
> > > for_each_set_bit(offset, valid, FW_RANGE) {
> > > i915_reg_t reg = { offset };
> > >
> > > + iosf_mbi_punit_acquire();
> > > intel_uncore_forcewake_reset(dev_priv, false);
> > > + iosf_mbi_punit_release();
> > > +
> > > check_for_unclaimed_mmio(dev_priv);
> > >
> > > (void)I915_READ(reg);
> >
> > This patch looks like one massive layering violation. Why does the GPU code muck
> > with the uncore hardware?
>
> Because the hw is an even worse layering violation. There's way too much
> "magic" stuff going on in the background, which then sometimes doesn't
> work and we end up implementing hacks in drivers to paper over it.

Ok, fair enough I guess:

Acked-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo