[2/2] fs: Fix race between io_destroy() and io_submit() in AIO

From: Milton Miller
Date: Tue Feb 15 2011 - 11:58:50 EST


> A race can occur when io_submit() races with io_destroy():
>
> CPU1 CPU2
> io_submit()
> do_io_submit()
> ...
> ctx = lookup_ioctx(ctx_id);
> io_destroy()
> Now do_io_submit() holds the last reference to ctx.
> ...
> queue new AIO
> put_ioctx(ctx) - frees ctx with active AIOs
>
> We solve this issue by checking whether ctx is being destroyed
> in AIO submission path after adding new AIO to ctx. Then we
> are guaranteed that either io_destroy() waits for new AIO or
> we see that ctx is being destroyed and bail out.
>
> Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> CC: Nick Piggin <npiggin@xxxxxxxxx>
>
> ---
> fs/aio.c | 15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index b4dd668..0244c04 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> goto out_put_req;
>
> spin_lock_irq(&ctx->ctx_lock);
> + /*
> + * We could have raced with io_destroy() and are currently holding a
> + * reference to ctx which should be destroyed. We cannot submit IO
> + * since ctx gets freed as soon as io_submit() puts its reference.
> + * The check here is reliable since io_destroy() sets ctx->dead before
> + * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
> + * io_destroy() waits for our IO to finish.
> + * The check is inside ctx->ctx_lock to avoid extra memory barrier
> + * in this fast path...
> + */

When reading this comment, and with all of the recient discussions I
had with Paul in the smp ipi thread (especially with resepect to third
party writes), I looked to see that the spinlock was paired with the
spinlock to set dead in io_destroy. It is not. It took me some time
to find that the paired lock is actually in wait_for_all_aios. Also,
dead is also set in aio_cancel_all which is under the same spinlock.

Please update this lack of memory barrier comment to reflect the locking.

> + if (ctx->dead) {
> + spin_unlock_irq(&ctx->ctx_lock);
> + ret = -EINVAL;
> + goto out_put_req;
> + }
> aio_run_iocb(req);
> if (!list_empty(&ctx->run_list)) {
> /* drain the run list */

thanks,
milton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/