Re: a major regression in recent kernels? - was: Re: Null pointer OOPS in sync_inodes_sb+0xa9/0x104

From: Jens Axboe
Date: Wed Mar 02 2011 - 19:15:18 EST


On 2011-03-02 13:31, Linus Torvalds wrote:
> Hmm. Adding some more people to the cc - in particular Jens. I'm not
> sure that he'd be on the fsdevel list (although maybe he is).
>
> The whole "backing_dev_info" has been a total disaster. The thing is
> crap. It violates all the normal kernel memory management rules ("Thou
> shalt use reference counts and free only when it goes to zero") and
> the whole thing has been a constant source of "oh, that driver didn't
> set it, but we changed all the code to require it to be correct".
>
> And the reason we set it to NULL when the device goes away is exactly
> that it's not ref-counted correctly, so we really _have_ to set it to
> NULL, because it's not going to be around.
>
> (And the reverse of that is why all kernel data structures should use
> refcounts, and not some external lifetime notion)
>
> Gaah. The caller has already done the check for a NULL s_bdi:
>
> /*
> * This should be safe, as we require bdi backing to actually
> * write out data in the first place
> */
> if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info)
> return 0;
>
> before it calls into "sync_inodes_sb(sb);", but since that stupid
> disconnect event can happen at any time, that doesn't protect
> anything. The s_bdi field clearly just becomes NULL after the check.
>
> Jens? This was introduced in 592b09a42fc3 ("backing-dev: ensure that a
> removed bdi no longer has super_block referencing it") over a year
> ago, but the _real_ problem goes back all the way to commit
> 32a88aa1b6df which introduced that broken "s_bdi" without refcounts to
> begin with.
>
> Guys, any idea on how to fix this? The hacky way might be to introduce
> a dummy backing_dev_info and instead of setting s_bdi to NULL, we set
> it to the dummy one that doesn't do anything. It's still hacky as
> hell, though. The real problem really is that having pointers to
> structures without refcounts is just _always_ wrong.
>
> See Documentation/CodingStyle: "Chapter 11: Data structures":
>
> "Remember: if another thread can find your data structure, and you don't
> have a reference count on it, you almost certainly have a bug."
>
> wiser words have seldom been spoken.

Agree, from the ->s_bdi point of view it has been and is a mess. I guess
the hope was to avoid adding real reference counting for the bdi. I just
took a quick look at it, and I don't think it'll be too problematic to
add. The bdi is mostly just a settings container.

We already pretty much have a dummy backing_dev_info,
default_backing_dev_info. 2.6.38 and stable backport would be to use
that and going forward ensure we probably reference the bdi when it's
attached to the super block.

I've got a flight coming up tomorrow, I'll cook up both patches for
this.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/