Re: possible deadlock in generic_file_write_iter (2)

From: Byungchul Park
Date: Wed Dec 06 2017 - 00:34:36 EST


On Tue, Dec 05, 2017 at 10:41:50AM +0100, Jan Kara wrote:
>
> Hello Byungchul,
>
> On Tue 05-12-17 13:58:09, Byungchul Park wrote:
> > On 12/4/2017 5:33 PM, Jan Kara wrote:
> > >adding Peter and Byungchul to CC since the lockdep report just looks
> > >strange and cross-release seems to be involved. Guys, how did #5 get into
> > >the lock chain and what does put_ucounts() have to do with sb_writers
> > >there? Thanks!
> >
> > Hello Jan,
> >
> > In order to get full stack of #5, we have to pass a boot param,
> > "crossrelease_fullstack", to the kernel. Now that it only informs
> > put_ucounts() in the call trace, it's hard to find out what exactly
> > happened at that time, but I can tell #5 shows:
>
> OK, thanks for the tip.
>
> > When acquire(sb_writers) in put_ucounts(), it was on the way to
> > complete((completion)&req.done) of wait_for_completion() in
> > devtmpfs_create_node().
> >
> > If acquire(sb_writers) in put_ucounts() is stuck, then
> > wait_for_completion() in devtmpfs_create_node() would be also
> > stuck, since complete() being in the context of acquire(sb_writers)
> > cannot be called.
>
> But this is something I don't get: There aren't sb_writers anywhere near
> put_ucounts(). So why the heck did lockdep think that sb_writers are
> acquired by put_ucounts()?

I also think it looks so weird. I just record _RET_IP_ or _THIS_IP_ when
acquire(sb_writers). Is it possible to get wrong _RET_IP_ or _THIS_IP_ by
any chance?

>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR