RE: [v2] drm/msm: add null checks for drm device to avoid crash during probe defer

From: Vinod Polimera
Date: Tue Sep 27 2022 - 03:31:45 EST




> -----Original Message-----
> From: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>
> Sent: Friday, August 26, 2022 2:11 PM
> To: Vinod Polimera (QUIC) <quic_vpolimer@xxxxxxxxxxx>; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; linux-arm-msm@xxxxxxxxxxxxxxx;
> freedreno@xxxxxxxxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx; robdclark@xxxxxxxxx;
> dianders@xxxxxxxxxxxx; vpolimer@xxxxxxxxxxx; swboyd@xxxxxxxxxxxx;
> kalyant@xxxxxxxxxxx
> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> during probe defer
>
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
>
> On 15/06/2022 15:23, Dmitry Baryshkov wrote:
> > On 03/06/2022 12:42, Vinod Polimera wrote:
> >> During probe defer, drm device is not initialized and an external
> >> trigger to shutdown is trying to clean up drm device leading to crash.
> >> Add checks to avoid drm device cleanup in such cases.
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at virtual
> >> address 00000000000000b8
> >>
> >> Call trace:
> >>
> >> drm_atomic_helper_shutdown+0x44/0x144
> >> msm_pdev_shutdown+0x2c/0x38
> >> platform_shutdown+0x2c/0x38
> >> device_shutdown+0x158/0x210
> >> kernel_restart_prepare+0x40/0x4c
> >> kernel_restart+0x20/0x6c
> >> __arm64_sys_reboot+0x194/0x23c
> >> invoke_syscall+0x50/0x13c
> >> el0_svc_common+0xa0/0x17c
> >> do_el0_svc_compat+0x28/0x34
> >> el0_svc_compat+0x20/0x70
> >> el0t_32_sync_handler+0xa8/0xcc
> >> el0t_32_sync+0x1a8/0x1ac
> >>
> >> Changes in v2:
> >> - Add fixes tag.
> >>
> >> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU
> components
> >> failed to bind")
> >> Signed-off-by: Vinod Polimera <quic_vpolimer@xxxxxxxxxxx>
> >> ---
> >> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c
> >> b/drivers/gpu/drm/msm/msm_drv.c
> >> index 4448536..d62ac66 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device
> *dev)
> >> struct msm_drm_private *priv = dev->dev_private;
> >> struct msm_kms *kms = priv->kms;
> >> + if (!irq_has_action(kms->irq))
> >> + return;
> >
> > As a second thought I'd still prefer a variable here. irq_has_action
> > would check that there is _any_ IRQ handler for this IRQ. While we do
> > not have anybody sharing this IRQ, I'd prefer to be clear here, that we
> > do not want to uninstall our IRQ handler rather than any IRQ handler.
>
> Vinod, do we still want to pursue this fix? If so, could you please
> update it according to the comment.
>
I have looked up and found many kernel drivers are using Irq_has_action to see if the interrupt is requested, it appears to me as an aggregable way of doing it. Having a variable to track the state seems unnecessary as it needs to be managed race free. let me know your views on it.
> >
> >> +
> >> kms->funcs->irq_uninstall(kms);
> >> if (kms->irq_requested)
> >> free_irq(kms->irq, dev);
> >> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
> >> ddev->dev_private = NULL;
> >> drm_dev_put(ddev);
> >> + priv->dev = NULL;
> >> destroy_workqueue(priv->wq);
> >> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct
> platform_device *pdev)
> >> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> >> struct drm_device *drm = priv ? priv->dev : NULL;
> >> - if (!priv || !priv->kms)
> >> + if (!priv || !priv->kms || !drm)
> >> return;
> >> drm_atomic_helper_shutdown(drm);
> >
> >
>
> --
> With best wishes
> Dmitry

- Vinod P.