Re: [PATCH v2] drm: drm_probe_helper: Fix output_poll_work scheduling

From: Peter Ujfalusi
Date: Mon Dec 19 2016 - 07:16:24 EST


On 12/19/2016 11:54 AM, Jani Nikula wrote:
> On Wed, 31 Aug 2016, Daniel Vetter <daniel@xxxxxxxx> wrote:
>> On Wed, Aug 31, 2016 at 02:09:05PM +0300, Peter Ujfalusi wrote:
>>> drm_kms_helper_poll_enable_locked() should check if we have delayed event
>>> pending and if we have, schedule the work to run without delay.
>>>
>>> Currently the output_poll_work is only scheduled if any of the connectors
>>> have DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT with
>>> DRM_OUTPUT_POLL_PERIOD delay. It does not matter if we have delayed event
>>> already registered to be handled. The detection will be delayd by
>>> DRM_OUTPUT_POLL_PERIOD in any case.
>>> Furthermore if none of the connectors are marked as POLL_CONNECT or
>>> POLL_DISCONNECT because all connectors are either POLL_HPD or they are
>>> always connected: the output_poll_work will not run at all even if we
>>> have delayed event marked.
>>>
>>> When none of the connectors require polling, their initial status change
>>> from unknown to connected/disconnected is not going to be handled until
>>> the first kms application starts or if we have fb console enabled.
>>>
>>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@xxxxxx>
>>> ---
>>> Hi,
>>>
>>> Changes since v1:
>>> - dropped the last paragraph from the commit message.
>>
>> I added a few more words to the commit message to explain when exactly
>> this is a problem and applied your patch to drm-misc.
>
> Hi Peter, sadly looks like this regresses users out there [1]. Seems to
> be a reliable bisect. We need to have this fixed or reverted.

When I sent the patch I did booted my laptop (with Intel Corporation Haswell-ULT
Integrated Graphics Controller (rev 0b)) and my old desktop with some nVidia Quadro
using nouveau stack and seen no issue.

But with 4.9 now on my Dell laptop I see lots of warnings during boot:
[ 0.573119] ------------[ cut here ]------------
[ 0.573127] WARNING: CPU: 3 PID: 874 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210
[ 0.573132] WARN_ON(!msg->buffer != !msg->size)
[ 0.573134] Modules linked in:
[ 0.573140] CPU: 3 PID: 874 Comm: kworker/u8:3 Not tainted 4.9.0-gentoo #1
[ 0.573143] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015
[ 0.573150] Workqueue: events_unbound async_run_entry_fn
[ 0.573154] ffffc900021afaf0 ffffffff813c3bad ffffc900021afb40 0000000000000000
[ 0.573160] ffffc900021afb30 ffffffff8106a596 0000042614b82908 ffffc900021afc00
[ 0.573166] ffff8802144040e0 0000000000000003 0000000000000000 ffff880214404158
[ 0.573172] Call Trace:
[ 0.573178] [<ffffffff813c3bad>] dump_stack+0x4f/0x72
[ 0.573182] [<ffffffff8106a596>] __warn+0xc6/0xe0
[ 0.573185] [<ffffffff8106a5fa>] warn_slowpath_fmt+0x4a/0x50
[ 0.573189] [<ffffffff815926e9>] ? intel_dp_aux_transfer+0xc9/0x210
[ 0.573193] [<ffffffff815927f5>] intel_dp_aux_transfer+0x1d5/0x210
[ 0.573198] [<ffffffff817d3e53>] ? _raw_write_unlock_irqrestore+0x13/0x30
[ 0.573202] [<ffffffff817d3e79>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[ 0.573207] [<ffffffff814b9e48>] drm_dp_dpcd_access+0x58/0xf0
[ 0.573210] [<ffffffff814b9ef6>] drm_dp_dpcd_write+0x16/0x20
[ 0.573214] [<ffffffff8158d8c3>] intel_dp_start_link_train+0x2b3/0x4a0
[ 0.573218] [<ffffffff8158ecd2>] intel_dp_check_link_status+0xb2/0xf0
[ 0.573222] [<ffffffff81593586>] intel_dp_detect+0x7d6/0xb40
[ 0.573226] [<ffffffff814bb1db>] drm_helper_probe_single_connector_modes+0x41b/0x4e0
[ 0.573233] [<ffffffff814c8d7c>] drm_fb_helper_initial_config+0x7c/0x3f0
[ 0.573237] [<ffffffff817d3e39>] ? _raw_spin_unlock_irq+0x9/0x10
[ 0.573242] [<ffffffff815858b3>] intel_fbdev_initial_config+0x13/0x30
[ 0.573245] [<ffffffff8108b832>] async_run_entry_fn+0x32/0xe0
[ 0.573249] [<ffffffff81083208>] process_one_work+0x148/0x4c0
[ 0.573253] [<ffffffff810835c3>] worker_thread+0x43/0x4e0
[ 0.573257] [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[ 0.573260] [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[ 0.573264] [<ffffffff8107f3c7>] ? call_usermodehelper_exec_async+0x137/0x140
[ 0.573269] [<ffffffff81088a45>] kthread+0xc5/0xe0
[ 0.573273] [<ffffffff81088980>] ? kthread_park+0x60/0x60
[ 0.573277] [<ffffffff8107f290>] ? umh_complete+0x40/0x40
[ 0.573280] [<ffffffff817d44f2>] ret_from_fork+0x22/0x30
[ 0.573285] ---[ end trace a544f5d689389b41 ]---

If I revert the patch in question I have tons of:

[ 0.569127] ------------[ cut here ]------------
[ 0.569136] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210
[ 0.569141] WARN_ON(!msg->buffer != !msg->size)
[ 0.569143] Modules linked in:
[ 0.569149] CPU: 3 PID: 564 Comm: kworker/3:2 Not tainted 4.9.0-gentoo #2
[ 0.569152] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015
[ 0.569159] Workqueue: events i915_hotplug_work_func
[ 0.569164] ffffc90001a93ba0 ffffffff813c3bad ffffc90001a93bf0 0000000000000000
[ 0.569174] ffffc90001a93be0 ffffffff8106a596 0000042614b12908 ffffc90001a93cb0
[ 0.569184] ffff88021482b0e0 0000000000000003 0000000000000000 ffff88021482b158
[ 0.569191] Call Trace:
[ 0.569198] [<ffffffff813c3bad>] dump_stack+0x4f/0x72
[ 0.569205] [<ffffffff8106a596>] __warn+0xc6/0xe0
[ 0.569210] [<ffffffff8106a5fa>] warn_slowpath_fmt+0x4a/0x50
[ 0.569214] [<ffffffff815926c9>] ? intel_dp_aux_transfer+0xc9/0x210
[ 0.569218] [<ffffffff815927d5>] intel_dp_aux_transfer+0x1d5/0x210
[ 0.569223] [<ffffffff817d3e53>] ? _raw_write_unlock_irqrestore+0x13/0x30
[ 0.569224] loop: module loaded
[ 0.569233] [<ffffffff817d3e79>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[ 0.569241] [<ffffffff814b9e48>] drm_dp_dpcd_access+0x58/0xf0
[ 0.569248] [<ffffffff814b9ef6>] drm_dp_dpcd_write+0x16/0x20
[ 0.569254] [<ffffffff8158d8a3>] intel_dp_start_link_train+0x2b3/0x4a0
[ 0.569261] [<ffffffff8158ecb2>] intel_dp_check_link_status+0xb2/0xf0
[ 0.569268] [<ffffffff81593566>] intel_dp_detect+0x7d6/0xb40
[ 0.569275] [<ffffffff8157c739>] i915_hotplug_work_func+0x1d9/0x2a0
[ 0.569283] [<ffffffff81083208>] process_one_work+0x148/0x4c0
[ 0.569290] [<ffffffff810835c3>] worker_thread+0x43/0x4e0
[ 0.569296] [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[ 0.569302] [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[ 0.569309] [<ffffffff81088a45>] kthread+0xc5/0xe0
[ 0.569316] [<ffffffff81088980>] ? kthread_park+0x60/0x60
[ 0.569322] [<ffffffff817d44f2>] ret_from_fork+0x22/0x30
[ 0.569330] ---[ end trace bc0abba135d2cb5d ]---

So the behaviour is changed, no question about it. But I still believe that
the patch itself fixes a valid bug (or shortcoming). atm I don't see
how it can lock the kernel.
The only thing the patch does is to schedule the output_poll_work w/o delay
if we already have delayed_event to reduce the time to handle pending
event(s) when the poll is enabled for the first time.
>From the log it is not clear for me why Xorg would lock up, but we have
intel and nouveau drivers probed, it might be because we have two GPUs?

>
> BR,
> Jani.
>
>
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=98690
>
>
>
>>
>> Thanks, Daniel
>>>
>>> Regards,
>>> Peter
>>>
>>> drivers/gpu/drm/drm_probe_helper.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
>>> index a0df377d7d1c..f6b64d7d3528 100644
>>> --- a/drivers/gpu/drm/drm_probe_helper.c
>>> +++ b/drivers/gpu/drm/drm_probe_helper.c
>>> @@ -129,6 +129,7 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev)
>>> {
>>> bool poll = false;
>>> struct drm_connector *connector;
>>> + unsigned long delay = DRM_OUTPUT_POLL_PERIOD;
>>>
>>> WARN_ON(!mutex_is_locked(&dev->mode_config.mutex));
>>>
>>> @@ -141,8 +142,13 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev)
>>> poll = true;
>>> }
>>>
>>> + if (dev->mode_config.delayed_event) {
>>> + poll = true;
>>> + delay = 0;
>>> + }
>>> +
>>> if (poll)
>>> - schedule_delayed_work(&dev->mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD);
>>> + schedule_delayed_work(&dev->mode_config.output_poll_work, delay);
>>> }
>>> EXPORT_SYMBOL(drm_kms_helper_poll_enable_locked);
>>>
>>> --
>>> 2.9.3
>>>
>


--
Péter