Re: possible deadlock in __ata_sff_interrupt

From: Waiman Long
Date: Fri Dec 16 2022 - 23:42:15 EST


On 12/16/22 22:05, Al Viro wrote:
On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote:
Ok, let's bring in Waiman for the rwlock side.

On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
Right, for a reader not in_interrupt(), it may be blocked by a random
waiting writer because of the fairness, even the lock is currently held
by a reader:

CPU 1 CPU 2 CPU 3
read_lock(&tasklist_lock); // get the lock

write_lock_irq(&tasklist_lock); // wait for the lock

read_lock(&tasklist_lock); // cannot get the lock because of the fairness
But this should be ok - because CPU1 can make progress and eventually
release the lock.

So the tasklist_lock use is fine on its own - the reason interrupts
are special is because an interrupt on CPU 1 taking the lock for
reading would deadlock otherwise. As long as it happens on another
CPU, the original CPU should then be able to make progress.

But the problem here seems to be thst *another* lock is also involved
(in this case apparently "host->lock", and now if CPU1 and CPU2 get
these two locks in a different order, you can get an ABBA deadlock.

And apparently our lockdep machinery doesn't catch that issue, so it
doesn't get flagged.
Lockdep has actually caught that; the locks involved are mention in the
report (https://marc.info/?l=linux-ide&m=167094379710177&w=2). The form
of report might have been better, but if anything, it doesn't mention
potential involvement of tasklist_lock writer, turning that into a deadlock.

OTOH, that's more or less implicit for the entire class:

read_lock(A) [non-interrupt]
local_irq_disable() local_irq_disable()
spin_lock(B) write_lock(A)
read_lock(A)
[in interrupt]
spin_lock(B)

is what that sort of reports is about. In this case A is tasklist_lock,
B is host->lock. Possible call chains for CPU1 and CPU2 are reported...

I wonder why analogues of that hadn't been reported for other SCSI hosts -
it's a really common pattern there...

I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
treats rwlocks as being _always_ unfair, not knowing about that "it's
only unfair when it's in interrupt context".

Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?
That may not be a good idea as the cacheline bouncing problem will be back with reduced performance.
ISTR threads about the possibility of explicit read_lock_unfair()...

Another possible alternative is to treat the read_lock as unfair if interrupt has been disabled as I think we should reduce the interrupt disabled interval as much as possible.

Thought?

Cheers,
Longman