Re: Lockdep warning on io_file_data_ref_zero() with 5.10-rc5

From: Xiaoguang Wang
Date: Tue Dec 15 2020 - 02:00:50 EST


hi,

On 11/28/20 5:13 PM, Pavel Begunkov wrote:
On 28/11/2020 23:59, Nadav Amit wrote:
Hello Pavel,

I got the following lockdep splat while rebasing my work on 5.10-rc5 on the
kernel (based on 5.10-rc5+).

I did not actually confirm that the problem is triggered without my changes,
as my iouring workload requires some kernel changes (not iouring changes),
yet IMHO it seems pretty clear that this is a result of your commit
e297822b20e7f ("io_uring: order refnode recycling”), that acquires a lock in
io_file_data_ref_zero() inside a softirq context.

Yeah, that's true. It was already reported by syzkaller and fixed by Jens, but
queued for 5.11. Thanks for letting know anyway!

https://lore.kernel.org/io-uring/948d2d3b-5f36-034d-28e6-7490343a5b59@xxxxxxxxx/T/#t


Jens, I think it's for the best to add it for 5.10, at least so that lockdep
doesn't complain.

Yeah maybe, though it's "just" a lockdep issue, it can't trigger any
deadlocks. I'd rather just keep it in 5.11 and ensure it goes to stable.
This isn't new in this series.
Sorry, I'm not familiar with lockdep implementation, here I wonder why you say
it can't trigger any deadlocks, looking at that the syzbot report, seems that
the deadlock may happen.

And I also wonder whether spin lock bh variants are enough, normal ios are
completed in interrupt context,
==> io_complete_rw
====> __io_complete_rw
======> io_complete_rw_common
========> __io_req_complete
==========> io_put_req
============> io_free_req
==============> __io_free_req
================> io_dismantle_req
==================> io_put_file
====================> percpu_ref_put(req->fixed_file_refs);
if we drop the last reference here,
io_file_data_ref_zero() will be called,
then we'll call spin_lock(&data->lock);
in interrupt context.

Should we use spin lock irq variants?

Regards,
Xiaoguang Wang