Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

From: Mike Galbraith
Date: Thu Dec 14 2017 - 21:40:36 EST


On Thu, 2017-12-14 at 22:54 +0100, Peter Zijlstra wrote:
> On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote:
>
> > Some time ago the block layer was changed to handle timeouts in thread context
> > instead of interrupt context. See also commit 287922eb0b18 ("block: defer
> > timeouts to a workqueue").
>
> That only makes it a little better:
>
> Task-A Worker
>
> write_seqcount_begin()
> blk_mq_rw_update_state(rq, IN_FLIGHT)
> blk_add_timer(rq)
> <timer>
> schedule_work()
> </timer>
> <context-switch to worker>
> read_seqcount_begin()
> while(seq & 1)
> cpu_relax();
>
>
> Now normally this isn't fatal because Worker will simply spin its entire
> time slice away and we'll eventually schedule our Task-A back in, which
> will complete the seqcount and things will work.
>
> But if, for some reason, our Worker was to have RT priority higher than
> our Task-A we'd be up some creek without no paddles.

Most kthreads, including kworkers, are very frequently SCHED_FIFO here.

-Mike