Re: [PATCH next] sbitmap: fix lockup while swapping

From: Jan Kara
Date: Mon Sep 26 2022 - 09:19:14 EST


On Fri 23-09-22 16:15:29, Hugh Dickins wrote:
> On Fri, 23 Sep 2022, Hugh Dickins wrote:
> > On Fri, 23 Sep 2022, Keith Busch wrote:
> >
> > > Does the following fix the observation? Rational being that there's no reason
> > > to spin on the current wait state that is already under handling; let
> > > subsequent clearings proceed to the next inevitable wait state immediately.
> >
> > It's running fine without lockup so far; but doesn't this change merely
> > narrow the window? If this is interrupted in between atomic_try_cmpxchg()
> > setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
> > don't we run the same risk as before, of sbitmap_queue_wake_up() from
> > the interrupt handler getting stuck on that wait_cnt 0?
>
> Yes, it ran successfully for 50 minutes, then an interrupt came in
> immediately after the cmpxchg, and it locked up just as before.
>
> Easily dealt with by disabling interrupts, no doubt, but I assume it's a
> badge of honour not to disable interrupts here (except perhaps in waking).

I don't think any magic with sbq_index_atomic_inc() is going to reliably
fix this. After all the current waitqueue may be the only one that has active
waiters so sbq_wake_ptr() will always end up returning this waitqueue
regardless of the current value of sbq->wake_index.

Honestly, this whole code needs a serious redesign. I have some
simplifications in mind but it will take some thinking and benchmarking so
we need some fix for the interim. I was pondering for quite some time about
some band aid to the problem you've found but didn't find anything
satisfactory.

In the end I see two options:

1) Take your patch (as wrong as it is ;). Yes, it can lead to lost wakeups
but we were living with those for a relatively long time so probably we can
live with them for some longer.

2) Revert Yu Kuai's original fix 040b83fcecfb8 ("sbitmap: fix possible io
hung due to lost wakeup") and my fixup 48c033314f37 ("sbitmap: Avoid leaving
waitqueue in invalid state in __sbq_wake_up()"). But then Keith would have
to redo his batched accounting patches on top.

> Some clever way to make the wait_cnt and wake_index adjustments atomic?
>
> Or is this sbitmap_queue_wake_up() interrupting sbitmap_queue_wake_up()
> just supposed never to happen, the counts preventing it: but some
> misaccounting letting it happen by mistake?

No, I think that is in principle a situation that we have to accommodate.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR