Additionally, you must replace the sleep_on calls with wait_event, or an open-coded wait queue: sleep_on is racy, it only works with cli().
IMHO the right way to fix cli() is
- add a single spinlock to the driver or the device structure. Do not forget the spin_lock_init().
- replace cli/sti with spin_lock_irqsave/spin_unlock_irqsave.
- Additionally acquire the spinlock in every interrupt handler (cli() stops all interrupts, spinlocks only stop interrupt on the current cpu).
- check if there were recursive cli() calls. Fix them.
- replace all sleep_on calls with wait queue calls.
- check if there are any kmalloc or schedule calls in the area now under the spinlock, and reorganize the code.
And please add a changelog entry that code was converted without testing.