Re: possible deadlock in send_sigio

From: Waiman Long
Date: Mon Jun 15 2020 - 13:14:14 EST


On 6/15/20 12:49 PM, Matthew Wilcox wrote:
On Fri, Jun 12, 2020 at 03:01:01PM +0800, Boqun Feng wrote:
On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.
[...]

+#ifdef CONFIG_LOCKDEP
+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+static inline bool read_lock_is_recursive(void)
+{
+ return force_read_lock_recursive ||
+ !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+ in_interrupt();
+}
I'm a bit uncomfortable with having the _lockdep_ definition of whether
a read lock is recursive depend on what the _implementation_ is.
The locking semantics should be the same, no matter which architecture
you're running on. If we rely on read locks being recursive in common
code then we have a locking bug on architectures which don't use queued
rwlocks.

I don't know whether we should just tell the people who aren't using
queued rwlocks that they have a new requirement or whether we should
say that read locks are never recursive, but having this inconsistency
is not a good idea!

Actually, qrwlock is more restrictive. It is possible that systems with qrwlock may hit deadlock which doesn't happens in other systems that use recursive rwlock. However, the current lockdep code doesn't detect those cases.

Changing lockdep to only use qrwlock semantics can be problematic as the code hunk in locking selftest is due to the fact that it assumes recursive lock. So we need to change that. Anyway, this patch can allow us to see if current qrwlock semantics may have potential deadlock problem in the current code. I actually have bug report about deadlock due to qrwlock semantics in RHEL7. So I would certainly like to see if the current upstream code may have also this kind of problem.

Cheers,
Longman