Re: drm/vkms: deadlock between dev->event_lock and timer

From: Helen Koike
Date: Mon Sep 18 2023 - 18:02:50 EST




On 14/09/2023 05:12, Daniel Vetter wrote:
On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote:
On 2023/09/14 6:08, Thomas Gleixner wrote:
Maybe the VKMS people need to understand locking in the first place. The
first thing I saw in this code is:

static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
{
...
mutex_unlock(&output->enabled_lock);

What?

Unlocking a mutex in the context of a hrtimer callback is simply
violating all mutex locking rules.

How has this code ever survived lock debugging without triggering a big
fat warning?

Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer
and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't
tested with the lock debugging yet...

Yeah that needs an immediate revert, there's not much that looks legit in
that patch. I'll chat with Maira.

Also yes how that landed without anyone running lockdep is ... not good. I
guess we need a lockdep enabled drm ci target that runs vkms tests asap
:-)

btw, I just executed a draft version of vkms targed on the ci, we do get the warning:

https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623

I'm just not sure if tests would fail (since it is a warning) and it has a chance to be ignored if people don't look at the logs (unless if igt already handles that).

I still need to do some adjustments (it seems the tests is hanging somewhere and we got a timeout) but we should have vkms in drm ci soon.

Regards,
Helen



Maíra and Arthur, mutex_unlock() from interrupt context is not permitted.
Please revert that patch immediately.
I guess that a semaphore (down()/up()) could be used instead of a mutex.

From a quick look this smells like a classic "try to use locking when you
want synchronization primitives", so semaphore here doesn't look any
better. The vkms_set_composer() function was originally for crc
generation, where it's userspace's job to make sure they wait for all the
crc they need to be generated before they shut it down again. But for
writeback the kernel must guarantee that the compositiona actually
happens, and the current function just doesn't make any such guarantees.

Cheers, Daniel